Hi,
I’m finetuning a RobertaForSequenceClassification model. For this model, what does model.eval()
do? Does it only deactivate the dropout layers?
In other words, are the following two snippets functionally equivalent?
Snippet A:
from transformers import AutoModelForSequenceClassification
model_name = "roberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
model.eval()
Snippet B:
from transformers import AutoModelForSequenceClassification, AutoConfig
model_name = "roberta-base"
config = AutoConfig.from_pretrained(model_name, hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0, num_labels=1)
model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)
The reason I’m asking the question is that when training RobertaForSequenceClassification (with the default config) on my data, I found that training with model.train()
doesn’t converge, but training with model.eval()
does (all other things being the same). I thought the only difference is that model.eval()
deactivates the Dropout layers in roberta, since there are no BatchNorm layers etc.
So I tried to manually set the hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0
as in Snippet B, and train with model.train()
. However, this still doesn’t converge. This means Snippet B isn’t equivalent to Snippet A – what might be wrong here?
Thanks in advance for any help!