Modeling_bert use next-token prediction?

nielsr · September 10, 2024, 7:30am

That’s because there were some people interested in initializing decoder-only LLMs with the weights of BERT. This was mainly for the EncoderDecoderModel class, where the weights of the encoder and decoder were both initialized from a pre-trained BERT. See Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models.

Topic		Replies	Views
Use of "input_ids,token_type_ids and lm_labels" in BERT Language model 🤗Transformers	1	1067	September 20, 2020
Understanding the encoder-decoder loss calculation VS CLM loss Beginners	0	370	February 21, 2024
Questions on the `BertModelLMHeadModel` 🤗Transformers	7	6357	October 5, 2020
From Transformers Version v4.12.0 onwards, The example colab BERT2BERT is wrong. (Things to keep in mind when using from transformers import EncoderDecoderModel) 🤗Transformers	0	284	February 16, 2024
Next sentence prediction on custom model 🤗Transformers	3	3432	May 14, 2024

Modeling_bert use next-token prediction?

Related topics