Bart-large-mnli zero-shot learning fine tuning problems

Hello,

I am beginner with NLP as well with this subreddit so sorry if I am formatting my post in a inproper way. For my work purposes I decided to investigate how could I apply NLP in order to classify the medical paper abstracts into particular groups of dentistry. I identified 8 different classes (‘labels’) that I would use. As a benchmark, I decided to try out zero-shot inference model at HuggingFace, namely, ‘facebook/bart-large-mnl’. The results were decent yet not very impressive so I decided to investigate if I could at least slightly improve the results (even on the training data itself) by fine-tuning to ~60 abstracts with a particular class provided.

I am to some extent aware of the problems working with such large models, e.g. I could only train on batch_size = 1-4 without exceeding memory limitations. I also havent played around with hyperparameters much because I have a feeling they are not the reasons for my problems.

So I trained for 1-4 epochs on those 60 abstracts with particular label, following some suggestions online I think I managed to preprocess this dataset into a proper NLI problem, i.e. 3 labels (in fact, 2: entailment and contradiction) and then increase my dataset size by artificially including contradiction examples, taking same abstract, providing a wrong hypothesis (i.e. a wrong class) and then providing a label with value 0: contradiction.

Unfortunately, independent of number of epochs, if I try to predict the field of dentistry with the fine tuned model via zero shot classification pipeline, the output probabilities are all ~0.125 which is almost random guessing. I was cautious that this might happen because abstracts are very long and vary in size, so I instead tried out running the same training on medical paper titles instead yet I encountered the same thing - output probabilities converging to random guessing.

Does anyone have an idea what could be the reason for this? As I said, I am a beginner so it certainly could be some simple error breaking everything up, but I tried to implement my code to the best of my understanding using whatever limited resources I could find on the huggingface forums and stackoverflow.