How can I stop text generation naturally in an LLM running locally with Hugging Face, without using a hard MAX TOKEN limit?

ksoman · November 8, 2024, 12:51am

Hello everyone, does anyone know the best way to stop text generation from an LLM when running it locally using Hugging Face? I’m not referring to setting a strict MAX TOKEN limit, but rather stopping it more naturally. It’s impressive how ChatGPT or Claude halt their generation in a smart way. How can this be achieved when running an LLM locally?

Parameters like max_new_tokens are very hard coded, often leading to either premature stopping or continuous generation. An ideal scenario would be for the LLM to naturally stop once it has adequately answered the user’s question.

Any suggestions? or Any working demo?

Thank you!

John6666 · November 8, 2024, 1:12am

I think the only option is to output it as a stream?

Topic		Replies	Views
Stopping `model.generate()` based on custom token Intermediate	2	4389	October 18, 2021
Token restriction via the Huggin face API Beginners	1	144	October 16, 2024
How to stop LLM from going up to the max token limit? Intermediate	1	111	September 25, 2024
Help with autotrain/LLM finetuning please Beginners	3	2142	August 11, 2023
Streaming token output from models like T5 🤗Transformers	7	12188	June 7, 2023

How can I stop text generation naturally in an LLM running locally with Hugging Face, without using a hard MAX TOKEN limit?

Related topics