[Suggestions and Guidance]Finetuning Bert models for Next word Prediction

Problem Statement : To produce a next word prediction model on legal text. The aim is to build an autocomplete model which will make use of existing typed text as well as a possible concatenation of vectors from prior clauses/paragraphs.

Current Approach: Because Bert based model are based on masked language, pretrained models such as LegalBert did not produce good accuracy for prediction of next word when the word to be predicted was marked as [MASK]. Here is an example sentence, “use of [MASK]” where “marked” is the next word to be predicted in place of “[MASK]” token. (Note that there would not be words present after the mask token, only before the token).

Currently approaching the problem as a SequenceClassification problem where labels are the token ids of the words that are to be predicted next. Will also attempt to finetune gpt2 on the legal text using run_clm.py from huggingface examples directory

Is there a better way to approach this problem of next word prediction?
Any suggestions and guidance would be welcome.
Thank you in advance

1 Like

Hi Sumanth! I believe you are already on the right track by finetuning gpt2. The difference is that GPT was trained using causal/autoregressive attention. It means that GPT is specifically trained to predict the next word without having access to the word to the right of the masked token (unlike BERT).

The different models and their architectures are depicted in this chart:

Long story short - you should see better results with GPT2. Let us know how it goes.


1 Like

Hey, Thanks for the prompt reply. Will focus my attempts more on autoregressive models.

@marshmellow77 a question. Is there a way to finetune and use T5 or BigBird for this Next word prediction task?. Unable to find tutorials for using these models for Next word prediction.

Yes, and it is actually pretty easy thanks to a script provided by Hugging Face: transformers/run_clm.py at master · huggingface/transformers · GitHub

You can use this script to finetune models for causal language modeling (i.e. next word prediction) on a text file or a dataset.