Zero-shot classification fine-tuning

Hello! I’m trying to figure out how to fine-tune bert-base-uncased to perform zero-shot classification. I managed to get it working but I’m not sure I’m preparing the data in the right way:

I initialize my model with the problem_type="multi_label_classification" setting so it uses a sigmoid loss function as opposed to softmax.

Then I prepare my data in the following way

  • I tokenize the input string and the label together using tokenizer(sentence,label, truncation=True) and save it in the input_ids field of my dataset.
  • Then I also convert the label as a one hot vector and keep it in the labels field of my dataset.

Finally I ran the training with about 10k lines of annotated data, but the results were kind of nonesense, not better than the untrained model.

Am I on the right path? Should I keep this as a self-supervised fine-tuning task?
Thanks a lot for your help!

Can you tell me how you are finetuning it? The code snippet part of it.


Thanks for your reply! I ended up reading the zero-shot classification paper and I realized I had the concept wrong. I ended up succeeding at my training.