Modeling_bert use next-token prediction?

nielsr · September 9, 2024, 10:29am

Hi,

Yes for any LLM which you can train in the Transformers library, the model will internally shift the labels one position so that it learns to predict the next token. The convenience of this is that users can just copy the labels from the inputs, i.e. labels = input_ids.clone() - although users then typically also replace tokens which the models shouldn’t learn to predict (like padding tokens) by -100.

Visually (taken from my explanation here):

As can be seen, the labels (top row) are equal to the inputs (bottom row), just shifted one position to the left, and with tokens which the model shouldn’t learn to predict (like the special <|begin_of_text|> in the figure above) replaced by -100.

Topic		Replies	Views
Use of "input_ids,token_type_ids and lm_labels" in BERT Language model 🤗Transformers	1	1040	September 20, 2020
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021
Questions on the `BertModelLMHeadModel` 🤗Transformers	7	6234	October 5, 2020
BERT Next Sentence Prediction: How to do predictions? Beginners	5	7549	September 29, 2022
Understanding the encoder-decoder loss calculation VS CLM loss Beginners	0	344	February 21, 2024

Modeling_bert use next-token prediction?

Related topics