[Suggestions and Guidance]Finetuning Bert models for Next word Prediction

sumba · January 24, 2022, 11:15am

Problem Statement : To produce a next word prediction model on legal text. The aim is to build an autocomplete model which will make use of existing typed text as well as a possible concatenation of vectors from prior clauses/paragraphs.

Current Approach: Because Bert based model are based on masked language, pretrained models such as LegalBert did not produce good accuracy for prediction of next word when the word to be predicted was marked as [MASK]. Here is an example sentence, “use of [MASK]” where “marked” is the next word to be predicted in place of “[MASK]” token. (Note that there would not be words present after the mask token, only before the token).

Currently approaching the problem as a SequenceClassification problem where labels are the token ids of the words that are to be predicted next. Will also attempt to finetune gpt2 on the legal text using run_clm.py from huggingface examples directory

Is there a better way to approach this problem of next word prediction?
Any suggestions and guidance would be welcome.
Thank you in advance

marshmellow77 · January 24, 2022, 1:02pm

Hi Sumanth! I believe you are already on the right track by finetuning gpt2. The difference is that GPT was trained using causal/autoregressive attention. It means that GPT is specifically trained to predict the next word without having access to the word to the right of the masked token (unlike BERT).

The different models and their architectures are depicted in this chart:

Long story short - you should see better results with GPT2. Let us know how it goes.

Cheers
Heiko

sumba · January 25, 2022, 3:51pm

Hey, Thanks for the prompt reply. Will focus my attempts more on autoregressive models.

sumba · January 26, 2022, 1:44pm

@marshmellow77 a question. Is there a way to finetune and use T5 or BigBird for this Next word prediction task?. Unable to find tutorials for using these models for Next word prediction.

marshmellow77 · January 26, 2022, 3:11pm

Yes, and it is actually pretty easy thanks to a script provided by Hugging Face: transformers/run_clm.py at master · huggingface/transformers · GitHub

You can use this script to finetune models for causal language modeling (i.e. next word prediction) on a text file or a dataset.

Topic		Replies	Views
How to fine-tune BERT model for next word prediction? Beginners	0	1113	October 3, 2021
How to separate sequences during finetuning gpt Beginners	0	292	December 19, 2020
Fine tuning bert on next sentence prediction task Intermediate	5	4044	September 30, 2020
Can't figure out how to implement gpt2 tokenizer in fine-tuning Beginners	0	330	July 22, 2022
Which model can use to pre-train a BERT model? Beginners	1	461	December 22, 2021

[Suggestions and Guidance]Finetuning Bert models for Next word Prediction

Related topics