I want fine tune my LLM (falcon-7b) to learn to stop : Which strategy?

Hi everyone, I want fine tune a LLM where its answer could be varied length (My task is kind of multi-label & multi class classification) (I want itto answer me none, part or all the 13 values I ask it).
But I want to learn it to stop when it needs. In order to not list all the 13 values.
So in my fine tuning code I do something like.

dataset = dataset.shuffle().map(lambda samples: tokenizer(samples["PROMPT"]), batched=True)
tokenizer.pad_token = tokenizer.eos_token (The model card says to do so).
trainer = transformers.Trainer(
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),

With PROMPT the prompts I want learn to the model.
The point is because of the data collator + eos_token=pad_token. My learning token are my PROMPT’s tokens well tokenized but then for each padding token got the famous value “-100”.
But in documentation it explains that “-100” value will be ignored. But I don’t want it to ignore the stop signal. I want it to learn it needs to stop when I say so.
So my question is : which strategy should be the best ?
Should I set aside the Data Collator and keep my padding (or eos) tokens and fine tuning the llm with padding at the end of pormpts ?
Or should I rather end my prompt with a special tokens, for instance : “>>ANSWER<<” (as in special_tokens_map.json · vilsonrodrigues/falcon-7b-instruct-sharded at main)
But I guess during its training it gives another meaning to this “>>ANSWER<<” token than the meaning I want. I do not have so many examples for my fine tuning so maybe it will not have time to learn that the new use of “>>ANSWER<<” token.
So someone can enlight me ?

Thanks you,