Use of "input_ids,token_type_ids and lm_labels" in BERT Language model

RichardWang · September 20, 2020, 2:05am

Hi @vikasRajashekar,
I assume what you said lm_labels is labels, and -1 is -100. (see docs here)

Yes

The model tries to learn to predict the last 4 tokens from the context. The context is all input tokens, includes the last four tokens, even the last four tokens are masked or replaced tokens, they contribute correct position information to context. Any way, all input tokens will be used to predict the last four tokens.

Topic		Replies	Views
Questions on the `BertModelLMHeadModel` 🤗Transformers	7	6272	October 5, 2020
BertForMaskedLM’s loss and scores, how the loss is computed? 🤗Transformers	13	25089	September 22, 2023
BertForMaskedLM train 🤗Transformers	2	784	January 20, 2021
Apply BertForTokenClassification on partially labeled input 🤗Transformers	0	260	November 16, 2021
Do I need token_type_ids for BertForSequenceClassification? 🤗Transformers	2	215	October 12, 2020

Use of "input_ids,token_type_ids and lm_labels" in BERT Language model

Related topics