Fill in the blank task - finetuning required?

jsrozner · October 1, 2024, 12:13am

I have a simple task:

Input: Fill in the blank: “The banana is colored [BLANK].”
Desired output: yellow

In general,

[BLANK] may be multiple tokens
[BLANK] can occur anywhere in the sequence
Completion should use bidirectional context

I have tried

flan-t5-xl
gpt2
gptneo-1.3B

I feel like I should be able to have good performance on this task without additional finetuning. But so far I have not gotten good performance from the models I’ve tried. Is finetuning likely required for this task? (It’s also possible my prompt does not match what the model saw in finetuning?) Ideally I’d be able to use a model around 1GB?

John6666 · October 1, 2024, 12:44am

General-purpose LLMs are surprisingly bad at doing those tasks accurately. Also, the strengths and weaknesses vary greatly depending on the personality of the model and the setting.

A workaround would be to fine-tune a generic LLM, divert an LLM that someone else has dedicated to the task, or find an LLM that is relatively well suited for the task.
I am not familiar with this, but I hear that Function Calling and the like can be used to perform routine tasks well. Please search for it.
There may be other options besides LLMs.

A recently released LLM that might be suitable would be the Qwen 2.5 0.5B or the Llama 3.2 3B, which is a bit larger.
Using 8B class model would be easier, but would be too large.

samchain · October 1, 2024, 4:01pm

If you’re sure that you’ll use this model only for this task, why not considering BERT or encoders that are pre-trained using a mask filling objective ? Those models attend to the full sequence during the forward pass. Thus they yield better performances on this precise task as they predict the mask token(s) using full context.

Try the bert base uncased as a naive approach and you can go smaller using roberta and other smaller version of such model.

jsrozner · October 1, 2024, 4:16pm

Is there not an issue with single vs multi-tokenized words? - I thought BERT would fill a single masked token? If a word is subword tokenized, it then would not work? I don’t know ahead of time whether the word is a single or multiple tokens.
GPT models and encoder/decoder should also be able to see the full sequence since we are representing the task as “Fill blank: Blah blah [BLANK] blah blah. Output: ”. Am I wrong about that?
Isn’t every type of these models (pre)trained on an MLM objective?

samchain · October 1, 2024, 4:23pm

1 - You can put multiple masks to be predicted, enabling you to predict a complete word regardless if i’ts multiple or single tokenized

2- GPT model use diagonal mask attention to make sure that the token at position t only attends to the token at positions t-1. So when you ask the GPT to fill the [BLANK], you’re giving a prompt that has a global context but built differently from the BERT ones.

3 - GPT are trained on a task of Next Token Prediction (NTP) while BERT in its first form is trained on MLM+Next Sequence Prediction. NTP is about predicting the next token at every positions while MLM is about Masked Language Modelling, which is actually… filling a blank.

Topic		Replies	Views
Usind a fine-tuned sentence completion model in a Masked LM task 🤗Transformers	2	379	September 23, 2021
Is masking still used when finetuning a BERT model? Beginners	1	1321	July 29, 2020
Fine-tuning a masked language model Beginners	0	354	February 2, 2022
Use custom model for mask filling using pipeline 🤗Transformers	0	339	September 27, 2023
Fill-mask and classification at the same time Beginners	4	800	March 18, 2022

Fill in the blank task - finetuning required?

Related topics