Hi everyone, I want fine tune a LLM where its answer could be varied length (My task is kind of multi-label & multi class classification) (I want itto answer me none, part or all the 13 values I ask it).
But I want to learn it to stop when it needs. In order to not list all the 13 values.
So in my fine tuning code I do something like.
dataset = dataset.shuffle().map(lambda samples: tokenizer(samples["PROMPT"]), batched=True)
tokenizer.pad_token = tokenizer.eos_token (The model card says to do so).
trainer = transformers.Trainer(
model=model,
train_dataset=dataset,
args=transformers.TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_steps=20,
num_train_epochs=5,
learning_rate=2e-4,
fp16=False,
logging_steps=1,
output_dir="lora-falcon40b-4bits-train",
save_steps=50,
save_total_limit=3,
run_name="train1"
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
With PROMPT the prompts I want learn to the model.
The point is because of the data collator + eos_token=pad_token. My learning token are my PROMPT’s tokens well tokenized but then for each padding token got the famous value “-100”.
But in documentation it explains that “-100” value will be ignored. But I don’t want it to ignore the stop signal. I want it to learn it needs to stop when I say so.
So my question is : which strategy should be the best ?
Should I set aside the Data Collator and keep my padding (or eos) tokens and fine tuning the llm with padding at the end of pormpts ?
Or should I rather end my prompt with a special tokens, for instance : “>>ANSWER<<” (as in special_tokens_map.json · vilsonrodrigues/falcon-7b-instruct-sharded at main)
But I guess during its training it gives another meaning to this “>>ANSWER<<” token than the meaning I want. I do not have so many examples for my fine tuning so maybe it will not have time to learn that the new use of “>>ANSWER<<” token.
So someone can enlight me ?
Thanks you,