How to select labels for multilabel zero-shot text classification

Hi, I am using transformers pipeline for zero-shot classification on a large set of more than 1m student reviews of courses conducted in the US and the UK. Example of one review is below:
“Very nice woman, extremely helpful if you go to her office hours, but scheme is a stupid language which makes this class boring and difficult. Very tough exams. Do good on your labs and projects and you’ll be okay.”
I read that choosing proper labels for zero-shot classification, with many domain-specific words, is key. Can you suggest general rules how to create such labels, they should be long or short, single domain specific words or complex sentences. For example there could be such approaches as:
candidate_labels = [“teaching skills”, “interpersonal skills”, “grading fairness”]
candidate_labels = [“teacher or professor teaching skills”, “teacher or professor interpersonal skills”, “course grading fairness”]
candidate_labels = [“teacher or professor good or bad teaching skills”, “teacher or professor good or bad interpersonal skills”, “course grading fair or unfair”]
I do not have a labelled test set to compare the accuracy, and I would like to avoid labeling a test set, as it is tedious.
Any suggestions? Maybe there are some papers that deal with this problem?


Hello, did you find out how the condadidate labels work?

Does anyone have a few basic guidelines to get the labels? I think the main point to improve zero-shot learning is to select good labels, but I didn’t find any resource that addresses this problem.
I was thinking of measuring the distance between the label embeddings and trying to change the wording to maximize it, which could help to improve the accuracy.

Hi, probably not the answer you’re looking for but SetFit is a great alternative to the zero-shot pipeline. It can work without labelling data at all, or label as little as 8 examples. This helps with label calibration, although it doesn’t completely avoid label engineering, it does improve model performance. Checkout a training example here, which also compares to the zero-shot pipeline.

Alternatively, you may wanna settle for some labels and then compare models. I have found for example that roberta-large-mnli worked a lot better than the default bart-large-mnli on the same labels.

1 Like

Hi @mabu,

I tried to use setfit to perform zero shot clasiification following this link: 

setfit/zero-shot-classification.ipynb at main · huggingface/setfit (, however the example uses hold-out dataset for evaluation, my case requires to output inference with non-labeled data, currently I have trainer readay :

from setfit import SetFitModel, SetFitTrainer

model = SetFitModel.from_pretrained(“sentence-transformers/paraphrase-mpnet-base-v2”)
trainer = SetFitTrainer(

Can I directly use trainer for inference? Instead of pushing it to HF hub and reload the model from hub. If yes, could you give me example what next step is? Thank you!

@miOmiO yes!

You can do it in a couple of ways.
Via __call__: trainer.model([sequence_to_classify])
Via predict: trainer.predict([sequence_to_classify])
And I think this is also possible: trainer.model.predict_proba([sequence_to_classify])

1 Like

Thank you so much, just try and it works. @mabu