Pipeline for sentiment classification

Hey everyone! I’m using the transformers pipeline for sentiment classification to classify unlabeled text. Unfortunately, I’m getting some very awful results! For example, the sentence below is classified as negative with 0.99 percent certainty!
sent = “The audience here in the hall has promised to remain silent.”
sentimentAnalysis = pipeline(task = “sentiment-analysis”)
print(sentimentAnalysis(sent))
# output : {‘label’: ‘NEGATIVE’, ‘score’: 0.9911394119262695}

Do you know what I can do to get better results for unlabeled text?
I actually tried training a large Roberta model on labeled text from kaggle and I’m getting so much better results, but I want to know why the pipeline is performing so bad, and what model it is actually using?

Hi Mitra, I am curious to know the metric performance (e.g. f1) between your trained model and the default pipeline. (How much better is the trained Roberta ?)

1 Like

The default model for sentiment analysis is a fine-tuned distilbert:

It’s therefore no surprise that RoBERTa performs a lot better.

That being said, I can understand why the model thinks this is negative. “to remain silent” is often uttered in very negative context (“You have the right to remain silent” when the police arrests someone) Especially for smaller models, this can weigh heavily.

1 Like

I’m using these models on unlabeled text so there is no specific metric for evaluating the models on my test set but as I used Roberta, Bert and XLnet I got the best results with only 3 epochs(around 71 percent test accuracy on the kaggle dataset) with Roberta! And then I used this model to classify the unlabeled text and after going over the results I didn’t see anything like what the pipeline was giving me!

1 Like

Thanks for responding Bram!
Exactly, but I was curious why the transformers team is not using a model like Roberta for the pipeline when it can give so much better results. I’ve trained the large-Roberta model on the sentiment analysis dataset on kaggle and could get to around 71 percent test accuracy with only 3 epochs and it is giving me very much more rational and accurate results in the unlabeled dataset too.

1 Like

Picking a “default” is always difficult. In this particular case, you have to choose on the axis going from “fast” to “accurate”. Larger models, like full RoBERTa models, are more accurate but slower. So for demos or simply as a default value, distilbert is a good choice.

As a user you can still change which model to use, so you can do

pipe = pipeline("sentiment-analysis", "roberta-large-mnli")
1 Like

Hi again everyone! I just wanted to thank all you guys for helping me with understanding different models and approaches! And I also want to share with you the kaggle notebook that I wrote on this subject and the video that I made. I’d really like to hear what you think about the whole thing! Thanks again for all your help!

1 Like