How to use begin_suppress_tokens with model that enforces BOS token (llama 1)

rowan250 · November 7, 2023, 4:19am

I use begin_suppress_tokens as a crude way to prevent a chatbot from parroting back an exact copy of the chat history.

This works fine with GPT-2 and GPT-J, but with LLaMa-1 it does not.

I think it’s because LLaMa returns <s> as the first token unconditionally, so anything passed to begin_suppress_tokens - which only affects the first generated token? - is effectively ignored.

I tried loading the tokenizer with add_bos_token=False, and the output excludes <s>, but the token(s) that are supposed to be excluded at the start are still generated.

Any ideas on how to make begin_suppress_tokens work on this model? Thanks!

Topic		Replies	Views
Llama2 pad token for batched inference Models	7	15580	March 31, 2024
Padding Token Missing from LLaMA Models	1	170	April 17, 2025
How to actually use padding in Lllama Tokenizers 🤗Transformers	2	4916	June 16, 2023
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation Models	5	3756	October 16, 2024
Llama model outputs strange words Beginners	0	130	December 1, 2024

How to use begin_suppress_tokens with model that enforces BOS token (llama 1)

Related topics