Ensuring Consistency in Results: A Focus on Reproducibility BERT

Hi everyone,

I am writing to you because I have a small doubt. I am running the BERT model bert-base-multilingual-cased, and I need to classify a category based on the question. Each time I run my model, I get slightly different results in my classification. For example, my category 1 varies from 2,400 to 3,000 rows, and the accuracy ranges from 77% to 80.5%.

Is this normal, or am I making a mistake?

How can I solve this problem to ensure my results are consistent?

Thanks :slight_smile:

1 Like

Hello !

Do you use the model locally or with cloud GPUs ? Models can have varying performances based on the metal hardware on which the model runs (float precision etc).
Also, to be sure the input of the model is a sentence being solely a question or is there any additional elements ? Did you fine tune the whole model or only a classification head ?

Hope this helps !

Hello.
If you are asking about inconsistent results, sometimes do_sample=True is the cause. do_sample=True makes the output basically unrepeatable.