Determinism in sequence classification

Hi! I noticed that in order to get reproducible results across runs when carrying out sequence classification, the type conversion for the labels from e.g. string to int has to be done consistently, i.e. every class has to be assigned to the same int across runs.

Say I have 5 classes: ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, and they get mapped to 0, 1, 2, 3, 4, respectively. If I run the exact same experiment but just changing this mapping to, say, 4, 3, 2, 1, 0, results won’t be the same. I’m assuming the distribution is not known (i.e. dataset can be imbalanced).

In fact, this is also mentioned in a comment in the run_glue.py example (here).

My question is simply: why? Why does this matter when computing cross entropy loss?

Thanks :slight_smile:

Even if you use the same seed, you will not get the same results because the last layer of the model, the classification head, will be initialized randomly the same way if you label are mapped to 0, 1, 2, 3, 4, or another permutation. That means that the initial weight for ‘a’ will be different if ‘a’ is mapped to 0 or 4. Then going from there, you will get different losses, so different gradients and different updates, and will end up with a completely different model.

1 Like

Of course! It sounds pretty obvious when you read it. Thanks a lot!

1 Like