BERT and RoBERTA giving same outputs

Hi All.

I tried using Roberta model in two different models. In both these models, I’ve faced same problem of getting same output for different test input during evaluation process.

Earlier, I thought it might be due to some implementation problem and hence I took a small dataset to overfit the dataset and predict the outputs for the same. I still got the same problem. Roberta was still giving out same output for different records.

I replaced Roberta with Bert and still got same issue.

Is there any bug in latest transformer version i.e. 4.10.2 (which I’m surely believe is very unlikely) or do have any other suggestion that I can try? I’ve used 4.2.1 version of transformer earlier and didn’t face this problem.

Also, I keep getting this warning while training and evaluation:

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

I checked online and I suspect this is not a issue but I am still not sure what it actually means. Could this be any issue?


@theguywithblacktie have you figured out what was wrong with your code as I am facing the same issue while using roberta from transformer library.

Any solutions?

Having the same issue +1

If I understood the problem correctly, the fine-tuned model always outputs the same value (e.g. the same class for a classification task). I had the same issue and tried to overcome it by tuning hyperparameters. The only hyperparameter that worked was the number of epochs.

I hope it helps,
Christoforos Spartalis