Hey everyone! Iām using the transformers pipeline for sentiment classification to classify unlabeled text. Unfortunately, Iām getting some very awful results! For example, the sentence below is classified as negative with 0.99 percent certainty!
sent = āThe audience here in the hall has promised to remain silent.ā
sentimentAnalysis = pipeline(task = āsentiment-analysisā)
print(sentimentAnalysis(sent))
# output : {ālabelā: āNEGATIVEā, āscoreā: 0.9911394119262695}
Do you know what I can do to get better results for unlabeled text?
I actually tried training a large Roberta model on labeled text from kaggle and Iām getting so much better results, but I want to know why the pipeline is performing so bad, and what model it is actually using?
Hi Mitra, I am curious to know the metric performance (e.g. f1) between your trained model and the default pipeline. (How much better is the trained Roberta ?)
The default model for sentiment analysis is a fine-tuned distilbert:
Itās therefore no surprise that RoBERTa performs a lot better.
That being said, I can understand why the model thinks this is negative. āto remain silentā is often uttered in very negative context (āYou have the right to remain silentā when the police arrests someone) Especially for smaller models, this can weigh heavily.
Iām using these models on unlabeled text so there is no specific metric for evaluating the models on my test set but as I used Roberta, Bert and XLnet I got the best results with only 3 epochs(around 71 percent test accuracy on the kaggle dataset) with Roberta! And then I used this model to classify the unlabeled text and after going over the results I didnāt see anything like what the pipeline was giving me!
Thanks for responding Bram!
Exactly, but I was curious why the transformers team is not using a model like Roberta for the pipeline when it can give so much better results. Iāve trained the large-Roberta model on the sentiment analysis dataset on kaggle and could get to around 71 percent test accuracy with only 3 epochs and it is giving me very much more rational and accurate results in the unlabeled dataset too.
Picking a ādefaultā is always difficult. In this particular case, you have to choose on the axis going from āfastā to āaccurateā. Larger models, like full RoBERTa models, are more accurate but slower. So for demos or simply as a default value, distilbert is a good choice.
Hi again everyone! I just wanted to thank all you guys for helping me with understanding different models and approaches! And I also want to share with you the kaggle notebook that I wrote on this subject and the video that I made. Iād really like to hear what you think about the whole thing! Thanks again for all your help!