Hi! I noticed that in order to get reproducible results across runs when carrying out sequence classification, the type conversion for the labels from e.g. string to int has to be done consistently, i.e. every class has to be assigned to the same int across runs.
Say I have 5 classes: ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, and they get mapped to 0, 1, 2, 3, 4, respectively. If I run the exact same experiment but just changing this mapping to, say, 4, 3, 2, 1, 0, results won’t be the same. I’m assuming the distribution is not known (i.e. dataset can be imbalanced).
In fact, this is also mentioned in a comment in the
run_glue.py example (here).
My question is simply: why? Why does this matter when computing cross entropy loss?