New pipeline for zero-shot text classification

Glad you enjoyed the post! Let me clarify.

When we use this pipeline, we are using a model trained on MNLI, including the last layer which predicts one of three labels: contradiction, neutral, and entailment. Since we have a list of candidate labels, each sequence/label pair is fed through the model as a premise/hypothesis pair, and we get out the logits for these three categories for each label. So for a single sequence we end up with a matrix of logits of shape (num_candidate_labels, 3).

When multi_class=False, we do a softmax of the entailment logits over all the candidate labels, i.e. logits[:,-1].softmax(dim=0). This gives a probability for each label such that they sum to one.

When multi_class=True, we do a softmax over entailment vs contradiction for each candidate label independently, i.e. logits[:,[0,-1]].softmax(dim=1)[:,-1]. This gives a probability for each candidate label between 0 and 1, but they are independent and do not sum to 1.

As for the hypothesis template, it is a template that formats a candidate label as a sequence. So if you鈥檙e about to pass a candidate label of politics through the model and you have the default hypothesis template of This example is about {}., the model would be fed This example is about politics. as the hypothesis.

2 Likes