New pipeline for zero-shot text classification

Another question:
I ran the model using pipeline() and got great results:

while using the manual approach described at Zero-Shot Learning in Modern NLP | Joe Davison Blog using
tokenizer = AutoTokenizer.from_pretrained(‘facebook/bart-large-mnli’)
model = AutoModel.from_pretrained(‘facebook/bart-large-mnli’)

yields different (and terrible) results on the same sentence and labels:

|label: nightlife | similarity: 0.14027079939842224|
|label: arts | similarity: 0.12448159605264664|
|label: stage | similarity: 0.11398478597402573|
|label: accommodation | similarity: 0.10639678686857224|
|label: outdoors | similarity: 0.10298262536525726|
|label: chat | similarity: 0.0851324051618576|
|label: fitness | similarity: 0.0802810788154602|
|label: family | similarity: 0.07305140048265457|
|label: travel | similarity: 0.06645495444536209|
|label: food | similarity: 0.05090881139039993|
|label: sports | similarity: 0.04867491126060486|
|label: health | similarity: 0.046865712851285934|
|label: music | similarity: 0.04231047257781029|
|label: social | similarity: 0.03655364364385605|
|label: shopping | similarity: 0.03481506183743477|
|label: events | similarity: 0.034809011965990067|
|label: fashion | similarity: 0.0223409254103899|
|label: culture | similarity: 0.013726986013352871|
|label: misc | similarity: -0.01880553364753723|

Am I missing something? I’m assuming this is happening because it’s not using Multi-classes.