Repetition Issues in Llama Models (3:8B, 3:70B, 3.1, 3.2)

I’m extracting Inputs, Outputs, and Summaries from large legacy codebases (COBOL, RPG), but facing repetition issues, especially when generating bullet points. Summaries work fine, but sections like Inputs/Outputs often have paragraphs or words repeated 30-40 times.

Not sure if this is specifically due to bullet points/listing type prompts, but the issue hasn’t occurred in any summary-type prompts.

Issues:

  • Repetition occurs mainly in bullet-point sections.
  • repetition_penalty=1.3 reduces it but causes data loss, and repetition_penalty=1.1 still has repetition.
  • Occurs in all Llama models (3:8B, 3:70B, 3.1, 3.2).

Tried:

  • Adjusting temperature, top-p, top-k, repetition penalty. Increasing temperature decreases the issue but I don’t want it to be more than 0.3 for my use case.
  • Updating the prompt.

Looking for a fix—maybe adjusting some other parameters? Any suggestions would be great!

1 Like

Hmm… Or perhaps try different models?


If you’ve already tested adjustments to temperature, top_p, top_k, and repetition_penalty, and they haven’t resolved the repetition issue for bullet-point generation in Llama models, here are additional strategies to explore:


1. Use Beam Search with Diversity

  • Switch to beam search (instead of greedy decoding or sampling) and enable diverse beam search to explore multiple potential outputs.
  • Set parameters like num_beams (e.g., 4-8) and diversity_penalty (e.g., 1.0) to encourage diversity in the generated bullet points.

2. Apply Token-Level Blacklisting

  • Mask or blacklisted frequently repeated tokens or phrases during generation. For example, if certain technical terms are being repeated unnecessarily, explicitly exclude them after they’ve been used once in a bullet point.

3. Modify the Context Window

  • Reduce the context window size (max_seq_length) to limit the model’s exposure to repetitive information in large codebases.
  • This forces the model to focus on the most recent context and avoids over-reliance on repeated patterns in the input.

4. Use a Second-Pass Filter

  • Incorporate a post-generation filter to remove duplicate bullet points or phrases.
  • Leverage simple keyword-based or similarity-based filtering to ensure each bullet point contains unique information.

5. Adjust Batch Size or Chunking

  • Process the input in smaller chunks (e.g., split the codebase into smaller logical pieces) and generate bullet points incrementally.
  • This reduces the likelihood of the model being overwhelmed by repetitive patterns in a single large input.

6. Apply a Masked Language Model (MLM) Approach

  • Use a combination of masked tokens and prompts to encourage the model to generate diverse bullet points. For example:
    • Mask certain repetitive words or phrases and prompt the model to fill in unique alternatives.

7. Fine-tune the Model

  • Fine-tune the Llama model on a dataset of similar legacy codebases where Inputs/Outputs are already well-formed and non-repetitive.
  • This can help the model better understand the context and avoid redundant phrases.

8. Use a Different Decoding Strategy

  • Experiment with nucleus sampling or typical sampling instead of purely top-p or temperature-based sampling.
  • These methods focus on the most likely tokens while maintaining diversity, potentially reducing repetition.

9. Apply a Custom Repetition Penalty

  • Implement a custom repetition penalty that penalizes the repetition of phrases or sequences, not just individual tokens.
  • For example, penalize sequences of 2-3 tokens that have already appeared in the output.

10. Use Early Stopping

  • Set an early stopping condition if the model starts repeating phrases excessively during generation.
  • This prevents the model from generating a large number of redundant bullet points.

11. Incorporate Contextual Filters

  • Use a secondary model or rule-based system to filter out bullet points that are semantically or contextually redundant.
  • For example, remove bullet points that are paraphrases of each other.

12. Explore Alternate Prompt Formats

  • While you specified excluding prompt engineering, subtle changes like bullet-point templates or structured formats might help without “prompting” the model directly. For example:
    • “For each Input/Output, describe it once and move to the next.”
    • “List each Input/Output only once, and avoid repeating details.”

13. Use Input Length Normalization

  • Normalize or truncate repetitive sections of the input to reduce redundancy before feeding it into the model.

14. Experiment with Different Model Sizes

  • Try smaller or larger models to see if the repetition issue persists. Sometimes, smaller models are less likely to overfit to repetitive patterns in the input.

15. Apply Response Length Penalties

  • Use response-length penalties to encourage the model to generate shorter, more concise bullet points, reducing the likelihood of repetition.

16. Try Ensembling

  • Generate bullet points using multiple models and combine the results. This can reduce repetition by leveraging the diversity of different models.

17. Leverage External Knowledge Databases

  • Use external knowledge databases or terminology lists to ensure bullet points are concise and avoid repetition.

Implementation Steps

  1. Start with Beam Search and Diversity:
    • Implement beam search with diversity_penalty to explore multiple generation paths.
  2. Use Token Blacklisting:
    • Mask frequently repeated tokens after they are used once.
  3. Adjust the Context Window:
    • Reduce the context window to limit the impact of repetitive patterns.
  4. Apply a Second-Pass Filter:
    • Introduce a post-processing step to remove redundant bullet points.

By combining these strategies, you can reduce repetition while maintaining the integrity of the generated bullet points for Inputs/Outputs. Let me know if you’d like to dive deeper into any of these approaches!