Pipeline for sentiment classification

Hey everyone! Iā€™m using the transformers pipeline for sentiment classification to classify unlabeled text. Unfortunately, Iā€™m getting some very awful results! For example, the sentence below is classified as negative with 0.99 percent certainty!
sent = ā€œThe audience here in the hall has promised to remain silent.ā€
sentimentAnalysis = pipeline(task = ā€œsentiment-analysisā€)
print(sentimentAnalysis(sent))
# output : {ā€˜labelā€™: ā€˜NEGATIVEā€™, ā€˜scoreā€™: 0.9911394119262695}

Do you know what I can do to get better results for unlabeled text?
I actually tried training a large Roberta model on labeled text from kaggle and Iā€™m getting so much better results, but I want to know why the pipeline is performing so bad, and what model it is actually using?

Hi Mitra, I am curious to know the metric performance (e.g. f1) between your trained model and the default pipeline. (How much better is the trained Roberta ?)

1 Like

The default model for sentiment analysis is a fine-tuned distilbert:

Itā€™s therefore no surprise that RoBERTa performs a lot better.

That being said, I can understand why the model thinks this is negative. ā€œto remain silentā€ is often uttered in very negative context (ā€œYou have the right to remain silentā€ when the police arrests someone) Especially for smaller models, this can weigh heavily.

1 Like

Iā€™m using these models on unlabeled text so there is no specific metric for evaluating the models on my test set but as I used Roberta, Bert and XLnet I got the best results with only 3 epochs(around 71 percent test accuracy on the kaggle dataset) with Roberta! And then I used this model to classify the unlabeled text and after going over the results I didnā€™t see anything like what the pipeline was giving me!

1 Like

Thanks for responding Bram!
Exactly, but I was curious why the transformers team is not using a model like Roberta for the pipeline when it can give so much better results. Iā€™ve trained the large-Roberta model on the sentiment analysis dataset on kaggle and could get to around 71 percent test accuracy with only 3 epochs and it is giving me very much more rational and accurate results in the unlabeled dataset too.

1 Like

Picking a ā€œdefaultā€ is always difficult. In this particular case, you have to choose on the axis going from ā€œfastā€ to ā€œaccurateā€. Larger models, like full RoBERTa models, are more accurate but slower. So for demos or simply as a default value, distilbert is a good choice.

As a user you can still change which model to use, so you can do

pipe = pipeline("sentiment-analysis", "roberta-large-mnli")
1 Like

Hi again everyone! I just wanted to thank all you guys for helping me with understanding different models and approaches! And I also want to share with you the kaggle notebook that I wrote on this subject and the video that I made. Iā€™d really like to hear what you think about the whole thing! Thanks again for all your help!

1 Like