I have a simple task:
Input: Fill in the blank: “The banana is colored [BLANK].”
Desired output: yellow
In general,
- [BLANK] may be multiple tokens
- [BLANK] can occur anywhere in the sequence
- Completion should use bidirectional context
I have tried
- flan-t5-xl
- gpt2
- gptneo-1.3B
I feel like I should be able to have good performance on this task without additional finetuning. But so far I have not gotten good performance from the models I’ve tried. Is finetuning likely required for this task? (It’s also possible my prompt does not match what the model saw in finetuning?) Ideally I’d be able to use a model around 1GB?
1 Like
General-purpose LLMs are surprisingly bad at doing those tasks accurately. Also, the strengths and weaknesses vary greatly depending on the personality of the model and the setting.
A workaround would be to fine-tune a generic LLM, divert an LLM that someone else has dedicated to the task, or find an LLM that is relatively well suited for the task.
I am not familiar with this, but I hear that Function Calling and the like can be used to perform routine tasks well. Please search for it.
There may be other options besides LLMs.
A recently released LLM that might be suitable would be the Qwen 2.5 0.5B or the Llama 3.2 3B, which is a bit larger.
Using 8B class model would be easier, but would be too large.
If you’re sure that you’ll use this model only for this task, why not considering BERT or encoders that are pre-trained using a mask filling objective ? Those models attend to the full sequence during the forward pass. Thus they yield better performances on this precise task as they predict the mask token(s) using full context.
Try the bert base uncased as a naive approach and you can go smaller using roberta and other smaller version of such model.
1 - You can put multiple masks to be predicted, enabling you to predict a complete word regardless if i’ts multiple or single tokenized
2- GPT model use diagonal mask attention to make sure that the token at position t only attends to the token at positions t-1. So when you ask the GPT to fill the [BLANK], you’re giving a prompt that has a global context but built differently from the BERT ones.
3 - GPT are trained on a task of Next Token Prediction (NTP) while BERT in its first form is trained on MLM+Next Sequence Prediction. NTP is about predicting the next token at every positions while MLM is about Masked Language Modelling, which is actually… filling a blank.