I’m finetuning a RobertaForSequenceClassification model. For this model, what does
model.eval() do? Does it only deactivate the dropout layers?
In other words, are the following two snippets functionally equivalent?
from transformers import AutoModelForSequenceClassification model_name = "roberta-base" model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1) model.eval()
from transformers import AutoModelForSequenceClassification, AutoConfig model_name = "roberta-base" config = AutoConfig.from_pretrained(model_name, hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0, num_labels=1) model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)
The reason I’m asking the question is that when training RobertaForSequenceClassification (with the default config) on my data, I found that training with
model.train() doesn’t converge, but training with
model.eval() does (all other things being the same). I thought the only difference is that
model.eval() deactivates the Dropout layers in roberta, since there are no BatchNorm layers etc.
So I tried to manually set the
hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0 as in Snippet B, and train with
model.train(). However, this still doesn’t converge. This means Snippet B isn’t equivalent to Snippet A – what might be wrong here?
Thanks in advance for any help!