I have a problem, trained a model with bert which give around 0.90% on test data and I decide to use it on new data which were not annotated. When running the model, I keep getting the same class output. And I would like to know why ?
Can you help me ?
Hey! Just wanted to state that I am facing the same problem and still don’t know how this is happening. I was working with RoBERTaForSequenceClassification with a binary classification problem and it gave 50% ACC. after 5 epochs. It did not learn anything at all.
Hey @emmakelo, it may be a dataset imbalance. Accuracy is generally not a good metric, check the precision and recall values.
1 Like
Thank you!! I solve the problem. Actually I was not loading correctly the model after training, do not know why ? Now the problem is corrected.
What was the issue when loading it? I think I’m having the same problem. I get exactly the same output from the model no matter what I give it as input.
loaded_model = AutoModelForSequenceClassification.from_pretrained("custom_model")
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
predict_input = tokenizer("Hello world", truncation=True, padding=True, return_tensors="pt")
predictions = model(**predict_input)
I think you need to set a seed before initialising the model.
I tried setting a seed and it worked for a bit, and then it went back to outputting the same thing. Any advice?
Hi, I also have the same problem. I am using Protbert model for protein sequence classification. While training it produces different logits for each train sequences. but for model.eval(), it produces the same logits for each of the sequences in the validation dataset? The same issue for the protT5 model from the huggingface.