Shifting ids to the right when training GPT-2 on text generation?


I am slightly confused how to exactly prepare the training data for GPT-2 text generation.

In order to train you have to provide input_ids (inputs) and labels (outputs). Both are supposed to be lists of token indices. This is the easy part.

Question: Are inputs_ids and labels supposed to be absolutely identical, or are the labels supposed to be input_ids shifted one element to the right?


During training, the labels are shifted inside the models (see the doc) so you should pass labels equal to input_ids.

Thanks! Is this true for the TFGPT2LMHeadModel, too?

Hi Tristan,
This is a good question! Did you work out definitively whether the labels are shifted under the hood within the TF implementation? There’s no mention of whether this is done automatically within the TF docs (vs the PyTorch docs where it is explicitly referenced).

In the TensorFlow code here it looks like the shift is done for us. There’s more going on in the PyTorch implementation (code here) - I’m assuming that’s just a nuance of the difference between TF & PyTorch implementation.