Repetitive Generations

What are some ways to reduce repetitive generations (same phrase repeated over and over) in large language models?

Here are some of the details on what I’ve tried so far:

  • Currently using the LangChain integration of VLLM
  • Tried generation parameters like frequency_penalty and presence_penalty without much luck
  • Currently using the default greedy sampling method of generation (not using beam search)
  • Fine-tuned a pre-trained language model for question-answering (QLoRA)
    - Set labels for fine-tuning data by masking everything but the ideal generations
  • Korean model

Is there maybe a way of fine-tuning LLMs to guide them away from repetitive generations?

If anyone has any tips or tricks that I could try, I would appreciate any help.