I’m playing with a variety of LLaMa models, especially some Wizard and Guanaco 4-bit versions. All of them frequently generate text that ends abruptly, as though they hit max_new_tokens
and just stopped.
I tried exponential_decay_length_penalty
but with limited luck. Maybe I’m using bad settings?
Strangely, I can’t find any discussion of how to configure generation to “find a good stopping point” prior to hitting the brick wall of max_new_tokens
. How is this supposed to work?