Does the transformer automatically shift by one position when calculating the autoregressive loss during the forward pass?

JuyiLin · March 19, 2025, 4:01pm

Here need shifthttps://huggingface.co/learn/nlp-course/en/chapter7/6?fw=pt#training-with-accelerate , does it because define loss function separately? Or everytime we need shift by ourselves.

shift_logits = logits[..., :-1, :].contiguous() # remove the last position
shift_labels = labels[..., 1:].contiguous() # remove the first position
loss = loss_fct(shift_logits.view(-1, vocab_size), shift_labels.view(-1))

Is this shift handled internally?

John6666 · March 20, 2025, 6:04am

It seems like you have to do it manually…?

The Hugging Face Transformers library does not automatically shift the labels when calculating the autoregressive loss. Instead, the user is responsible for manually shifting the labels to align with the model’s predictions, as demonstrated in the provided code example. This ensures that each token’s prediction corresponds correctly to the next token in the sequence.

Answer: No, the Hugging Face Transformers library does not automatically handle the shifting of labels for autoregressive loss. You need to manually shift the logits and labels as shown in your code example [1].

Topic		Replies	Views
Transformer shifting output question 🤗Transformers	1	344	May 13, 2024
Source and target vs input and labels for causal autoregressive language models Beginners	1	1724	July 27, 2022
GPT-2 shift logits and labels 🤗Transformers	5	5827	May 12, 2023
Where does the Transformers do the target text shifting in causal LM? Beginners	4	4807	February 21, 2025
Gemma3 - shift labels to the right 🤗Transformers	3	64	April 8, 2025

Does the transformer automatically shift by one position when calculating the autoregressive loss during the forward pass?

Related topics