How to use begin_suppress_tokens with model that enforces BOS token (llama 1)

I use begin_suppress_tokens as a crude way to prevent a chatbot from parroting back an exact copy of the chat history.

This works fine with GPT-2 and GPT-J, but with LLaMa-1 it does not.

I think it’s because LLaMa returns <s> as the first token unconditionally, so anything passed to begin_suppress_tokens - which only affects the first generated token? - is effectively ignored.

I tried loading the tokenizer with add_bos_token=False, and the output excludes <s>, but the token(s) that are supposed to be excluded at the start are still generated.

Any ideas on how to make begin_suppress_tokens work on this model? Thanks!