Is model.eval() equivalent to setting dropout as 0?

Hi,

I’m finetuning a RobertaForSequenceClassification model. For this model, what does model.eval() do? Does it only deactivate the dropout layers?

In other words, are the following two snippets functionally equivalent?

Snippet A:

from transformers import AutoModelForSequenceClassification
model_name = "roberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
model.eval()

Snippet B:

from transformers import AutoModelForSequenceClassification, AutoConfig
model_name = "roberta-base"
config = AutoConfig.from_pretrained(model_name, hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0, num_labels=1)
model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)

The reason I’m asking the question is that when training RobertaForSequenceClassification (with the default config) on my data, I found that training with model.train() doesn’t converge, but training with model.eval() does (all other things being the same). I thought the only difference is that model.eval() deactivates the Dropout layers in roberta, since there are no BatchNorm layers etc.

So I tried to manually set the hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0 as in Snippet B, and train with model.train(). However, this still doesn’t converge. This means Snippet B isn’t equivalent to Snippet A – what might be wrong here?

Thanks in advance for any help!

1 Like