Training los 0.0 right from the start

I’m trying to train distilgpt2 following the given tutorial:

I have more or less the same code in my notebook, except the dataset I am using is this one:

For some reason the training loss seems to be 0.0 right from the start. I don’t really know why.