What are some ways to reduce repetitive generations (same phrase repeated over and over) in large language models?
Here are some of the details on what I’ve tried so far:
- Currently using the LangChain integration of VLLM
- Tried generation parameters like
frequency_penalty
andpresence_penalty
without much luck - Currently using the default greedy sampling method of generation (not using beam search)
- Fine-tuned a pre-trained language model for question-answering (QLoRA)
- Set labels for fine-tuning data by masking everything but the ideal generations - Korean model
Is there maybe a way of fine-tuning LLMs to guide them away from repetitive generations?
If anyone has any tips or tricks that I could try, I would appreciate any help.