Stopping generation before max_new_tokens

I’m playing with a variety of LLaMa models, especially some Wizard and Guanaco 4-bit versions. All of them frequently generate text that ends abruptly, as though they hit max_new_tokens and just stopped.

I tried exponential_decay_length_penalty but with limited luck. Maybe I’m using bad settings?

Strangely, I can’t find any discussion of how to configure generation to “find a good stopping point” prior to hitting the brick wall of max_new_tokens. How is this supposed to work?

1 Like