I want fine tune my LLM (falcon-7b) to learn to stop : Which strategy?

EquinoxElahin · August 9, 2023, 1:43pm

Hi everyone, I want fine tune a LLM where its answer could be varied length (My task is kind of multi-label & multi class classification) (I want itto answer me none, part or all the 13 values I ask it).
But I want to learn it to stop when it needs. In order to not list all the 13 values.
So in my fine tuning code I do something like.

dataset = dataset.shuffle().map(lambda samples: tokenizer(samples["PROMPT"]), batched=True)
tokenizer.pad_token = tokenizer.eos_token (The model card says to do so).
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=20,
        num_train_epochs=5,
        learning_rate=2e-4,
        fp16=False,
        logging_steps=1,
        output_dir="lora-falcon40b-4bits-train",
        save_steps=50,
        save_total_limit=3,
        run_name="train1"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

With PROMPT the prompts I want learn to the model.
The point is because of the data collator + eos_token=pad_token. My learning token are my PROMPT’s tokens well tokenized but then for each padding token got the famous value “-100”.
But in documentation it explains that “-100” value will be ignored. But I don’t want it to ignore the stop signal. I want it to learn it needs to stop when I say so.
So my question is : which strategy should be the best ?
Should I set aside the Data Collator and keep my padding (or eos) tokens and fine tuning the llm with padding at the end of pormpts ?
Or should I rather end my prompt with a special tokens, for instance : “>>ANSWER<<” (as in special_tokens_map.json · vilsonrodrigues/falcon-7b-instruct-sharded at main)
But I guess during its training it gives another meaning to this “>>ANSWER<<” token than the meaning I want. I do not have so many examples for my fine tuning so maybe it will not have time to learn that the new use of “>>ANSWER<<” token.
So someone can enlight me ?

Thanks you,

Topic		Replies	Views
Fine-tuning queries Beginners	0	39	February 20, 2025
Instruction tuning llm Beginners	8	12379	May 8, 2024
How do LLMs identify generation start point during fine-tuning? 🤗Transformers	5	109	September 9, 2024
Exhaustive list of changes across all touchpoints in the tokenization pipeline of LM training 🤗Transformers	0	288	June 26, 2023
Errors when trying to fine-tune OpenLLaMA using Trainer API 🤗Transformers	1	376	December 26, 2024

I want fine tune my LLM (falcon-7b) to learn to stop : Which strategy?

Related topics