How is the data shifted by one token during CausalLM fine tuning

Mursel · June 14, 2023, 5:05pm

There is an explanation in documentation on how labels are shifted inside the model: Causal language modeling

Also, there is a PR in transformers github repo on this: Shifting labels for causal LM when using label smoother by seungeunrho · Pull Request #17987 · huggingface/transformers · GitHub

So, shifting is handled inside the model. ‘input_ids’ and ‘labels’ can be very same tensors, however the model will do a ‘causal-shift’ inside.
In an example, let’s assume we have ‘input_ids’ as [1,2,3,4,5,6,7,8] and ‘labels’ again same tensor [1,2,3,4,5,6,7,8]; the model will do the shifting such that [null, 1,2,3,4,5,6,7] will predict [1,2,3,4,5,6,7,8]

Topic		Replies	Views
Error in DataCollator section of Hugging Face Tutorial LM fine tuning Beginners	2	256	January 12, 2024
Documentation: Transformers Language Modeling Section Beginners	0	324	May 14, 2022
Where does the Transformers do the target text shifting in causal LM? Beginners	4	4377	February 21, 2025
A question about the DataCollator for LM 🤗Tokenizers	2	297	May 6, 2024
Data Preparation for CausalLM 🤗Transformers	1	1196	March 16, 2023

How is the data shifted by one token during CausalLM fine tuning

Related topics