How to prevent LLM from generating multiple rounds of conversation?

wyzxxywl · July 7, 2023, 7:25pm

When you are generating responses, you can set the eos token to be "User: ". For example inference_config.eos_token_id = tokenizer("User: ")[“input_ids”]. One caveat is that "User: " might not only be the prefix, I would change it to "###User: " during finetuning.

Topic		Replies	Views
Reducing unwanted generation in Gemma 3 🤗Transformers	7	550	April 5, 2025
Please save me : GPT like model Generation gone wrong 🤗Transformers	0	55	July 4, 2024
Why does tokenizer.apply_chat_template() add multiple eos tokens? Intermediate	4	921	September 19, 2024
Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines 🤗Transformers	16	29080	January 10, 2025
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation Models	5	4395	October 16, 2024

How to prevent LLM from generating multiple rounds of conversation?

Related topics