I use begin_suppress_tokens as a crude way to prevent a chatbot from parroting back an exact copy of the chat history.
This works fine with GPT-2 and GPT-J, but with LLaMa-1 it does not.
I think it’s because LLaMa returns <s>
as the first token unconditionally, so anything passed to begin_suppress_tokens - which only affects the first generated token? - is effectively ignored.
I tried loading the tokenizer with add_bos_token=False, and the output excludes <s>
, but the token(s) that are supposed to be excluded at the start are still generated.
Any ideas on how to make begin_suppress_tokens work on this model? Thanks!