I am writing to you because I have a small doubt. I am running the BERT model bert-base-multilingual-cased, and I need to classify a category based on the question. Each time I run my model, I get slightly different results in my classification. For example, my category 1 varies from 2,400 to 3,000 rows, and the accuracy ranges from 77% to 80.5%.
Is this normal, or am I making a mistake?
How can I solve this problem to ensure my results are consistent?
Do you use the model locally or with cloud GPUs ? Models can have varying performances based on the metal hardware on which the model runs (float precision etc).
Also, to be sure the input of the model is a sentence being solely a question or is there any additional elements ? Did you fine tune the whole model or only a classification head ?