There is an explanation in documentation on how labels are shifted inside the model: Causal language modeling
Also, there is a PR in transformers github repo on this: Shifting labels for causal LM when using label smoother by seungeunrho · Pull Request #17987 · huggingface/transformers · GitHub
So, shifting is handled inside the model. ‘input_ids’ and ‘labels’ can be very same tensors, however the model will do a ‘causal-shift’ inside.
In an example, let’s assume we have ‘input_ids’ as [1,2,3,4,5,6,7,8] and ‘labels’ again same tensor [1,2,3,4,5,6,7,8]; the model will do the shifting such that [null, 1,2,3,4,5,6,7] will predict [1,2,3,4,5,6,7,8]