Hi,
I’m trying to train distilgpt2 following the given tutorial:
I have more or less the same code in my notebook, except the dataset I am using is this one:
For some reason the training loss seems to be 0.0 right from the start. I don’t really know why.