Hello! I’m trying to figure out how to fine-tune bert-base-uncased to perform zero-shot classification. I managed to get it working but I’m not sure I’m preparing the data in the right way:
I initialize my model with the problem_type="multi_label_classification"
setting so it uses a sigmoid loss function as opposed to softmax.
Then I prepare my data in the following way
- I tokenize the input string and the label together using
tokenizer(sentence,label, truncation=True)
and save it in theinput_ids
field of my dataset. - Then I also convert the label as a one hot vector and keep it in the
labels
field of my dataset.
Finally I ran the training with about 10k lines of annotated data, but the results were kind of nonesense, not better than the untrained model.
Am I on the right path? Should I keep this as a self-supervised fine-tuning task?
Thanks a lot for your help!