Glad you enjoyed the post! Let me clarify.
When we use this pipeline, we are using a model trained on MNLI, including the last layer which predicts one of three labels: contradiction
, neutral
, and entailment
. Since we have a list of candidate labels, each sequence/label pair is fed through the model as a premise/hypothesis pair, and we get out the logits for these three categories for each label. So for a single sequence we end up with a matrix of logits of shape (num_candidate_labels, 3)
.
When multi_class=False
, we do a softmax of the entailment
logits over all the candidate labels, i.e. logits[:,-1].softmax(dim=0)
. This gives a probability for each label such that they sum to one.
When multi_class=True
, we do a softmax over entailment
vs contradiction
for each candidate label independently, i.e. logits[:,[0,-1]].softmax(dim=1)[:,-1]
. This gives a probability for each candidate label between 0 and 1, but they are independent and do not sum to 1.
As for the hypothesis template, it is a template that formats a candidate label as a sequence. So if you鈥檙e about to pass a candidate label of politics
through the model and you have the default hypothesis template of This example is about {}.
, the model would be fed This example is about politics.
as the hypothesis.