[Suggestions and Guidance]Finetuning Bert models for Next word Prediction

marshmellow77 · January 24, 2022, 1:02pm

Hi Sumanth! I believe you are already on the right track by finetuning gpt2. The difference is that GPT was trained using causal/autoregressive attention. It means that GPT is specifically trained to predict the next word without having access to the word to the right of the masked token (unlike BERT).

The different models and their architectures are depicted in this chart:

Long story short - you should see better results with GPT2. Let us know how it goes.

Cheers
Heiko

Topic		Replies	Views
How to fine-tune BERT model for next word prediction? Beginners	0	1107	October 3, 2021
How to separate sequences during finetuning gpt Beginners	0	291	December 19, 2020
Fine tuning bert on next sentence prediction task Intermediate	5	4038	September 30, 2020
Can't figure out how to implement gpt2 tokenizer in fine-tuning Beginners	0	329	July 22, 2022
Which model can use to pre-train a BERT model? Beginners	1	458	December 22, 2021

[Suggestions and Guidance]Finetuning Bert models for Next word Prediction

Related topics