Determinism in sequence classification

agarfau · July 1, 2021, 1:17pm

Hi! I noticed that in order to get reproducible results across runs when carrying out sequence classification, the type conversion for the labels from e.g. string to int has to be done consistently, i.e. every class has to be assigned to the same int across runs.

Say I have 5 classes: ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, and they get mapped to 0, 1, 2, 3, 4, respectively. If I run the exact same experiment but just changing this mapping to, say, 4, 3, 2, 1, 0, results won’t be the same. I’m assuming the distribution is not known (i.e. dataset can be imbalanced).

In fact, this is also mentioned in a comment in the run_glue.py example (here).

My question is simply: why? Why does this matter when computing cross entropy loss?

Thanks

sgugger · July 1, 2021, 1:21pm

Even if you use the same seed, you will not get the same results because the last layer of the model, the classification head, will be initialized randomly the same way if you label are mapped to 0, 1, 2, 3, 4, or another permutation. That means that the initial weight for ‘a’ will be different if ‘a’ is mapped to 0 or 4. Then going from there, you will get different losses, so different gradients and different updates, and will end up with a completely different model.

agarfau · July 1, 2021, 1:33pm

Of course! It sounds pretty obvious when you read it. Thanks a lot!

Topic		Replies	Views
Why does the median cross entropy loss change when I change the random seed? 🤗Transformers	4	711	August 23, 2020
Different results each time I run code Beginners	0	707	July 13, 2022
Multiple training will give exactly the same result except for the first time 🤗Transformers	1	3588	July 19, 2021
Finetune model outputs diffrent predictions at each run ? why? 🤗Transformers	0	374	December 15, 2021
Trainer.evaluate() 🤗Transformers	3	6884	May 11, 2021

Determinism in sequence classification

Related topics