I am slightly confused how to exactly prepare the training data for GPT-2 text generation.
In order to train you have to provide input_ids (inputs) and labels (outputs). Both are supposed to be lists of token indices. This is the easy part.
Question: Are inputs_ids and labels supposed to be absolutely identical, or are the labels supposed to be input_ids shifted one element to the right?