Shifting ids to the right when training GPT-2 on text generation?

TristanBehrens · April 5, 2021, 1:22pm

Hello,

I am slightly confused how to exactly prepare the training data for GPT-2 text generation.

In order to train you have to provide input_ids (inputs) and labels (outputs). Both are supposed to be lists of token indices. This is the easy part.

Question: Are inputs_ids and labels supposed to be absolutely identical, or are the labels supposed to be input_ids shifted one element to the right?

Best,
Tristan

sgugger · April 5, 2021, 1:27pm

During training, the labels are shifted inside the models (see the doc) so you should pass labels equal to input_ids.

TristanBehrens · April 5, 2021, 2:41pm

Thanks! Is this true for the TFGPT2LMHeadModel, too?

richardg · January 25, 2023, 9:53am

Hi Tristan,
This is a good question! Did you work out definitively whether the labels are shifted under the hood within the TF implementation? There’s no mention of whether this is done automatically within the TF docs (vs the PyTorch docs where it is explicitly referenced).
Thanks

richardg · January 25, 2023, 10:02am

In the TensorFlow code here it looks like the shift is done for us. There’s more going on in the PyTorch implementation (code here) - I’m assuming that’s just a nuance of the difference between TF & PyTorch implementation.

Topic		Replies	Views
GPT-2 shift logits and labels 🤗Transformers	5	5829	May 12, 2023
When i use TFGPT2LMHeadModel, how can i build labels?labels = inputs_ids or labels = inputs_ids[1:] 🤗Transformers	0	363	July 18, 2022
Regarding input_ids and labels while grouping texts 🤗Optimum	3	349	April 24, 2024
Generate desired text output based on model training Intermediate	3	292	December 17, 2024
Newbie Understanding GPT2 loss 🤗Transformers	1	5094	March 12, 2023

Shifting ids to the right when training GPT-2 on text generation?

Related topics