Is model.eval() equivalent to setting dropout as 0?

veronica320 · July 7, 2022, 12:00am

Hi,

I’m finetuning a RobertaForSequenceClassification model. For this model, what does model.eval() do? Does it only deactivate the dropout layers?

In other words, are the following two snippets functionally equivalent?

Snippet A:

from transformers import AutoModelForSequenceClassification
model_name = "roberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
model.eval()

Snippet B:

from transformers import AutoModelForSequenceClassification, AutoConfig
model_name = "roberta-base"
config = AutoConfig.from_pretrained(model_name, hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0, num_labels=1)
model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)

The reason I’m asking the question is that when training RobertaForSequenceClassification (with the default config) on my data, I found that training with model.train() doesn’t converge, but training with model.eval() does (all other things being the same). I thought the only difference is that model.eval() deactivates the Dropout layers in roberta, since there are no BatchNorm layers etc.

So I tried to manually set the hidden_dropout_prob=0.0, attention_probs_dropout_prob=0.0 as in Snippet B, and train with model.train(). However, this still doesn’t converge. This means Snippet B isn’t equivalent to Snippet A – what might be wrong here?

Thanks in advance for any help!

Topic		Replies	Views
Tfmodelforquestionanswering in eval mode 🤗Transformers	2	332	October 29, 2020
Are dropout layers activated when calling model.generate()? 🤗Transformers	2	71	September 7, 2024
Shockingly Incorrect Evaluate Function in Huggingface API 🤗Transformers	1	166	November 2, 2023
Eval Steps after warm-up 🤗Transformers	0	248	August 7, 2021
Make predictions with the Dropout on Beginners	1	1451	July 17, 2021

Is model.eval() equivalent to setting dropout as 0?

Related topics