I’m playing with a variety of LLaMa models, especially some Wizard and Guanaco 4-bit versions. All of them frequently generate text that ends abruptly, as though they hit
max_new_tokens and just stopped.
exponential_decay_length_penalty but with limited luck. Maybe I’m using bad settings?
Strangely, I can’t find any discussion of how to configure generation to “find a good stopping point” prior to hitting the brick wall of
max_new_tokens. How is this supposed to work?