Fine tune a 8G model on chat data with/without LORA.
With LORA:
peft_config = LoraConfig(
lora_alpha=128,
lora_dropout=0.05,
r=256,
bias="none",
target_modules="all-linear",
task_type="CAUSAL_LM",
)
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset,
peft_config=peft_config,
max_seq_length=max_seq_length,
tokenizer=tokenizer,
packing=True,
dataset_kwargs={
"add_special_tokens": False,
"append_concat_token": False,
},
)
Without LORA:
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset,
max_seq_length=max_seq_length,
tokenizer=tokenizer,
packing=True,
dataset_kwargs={
"add_special_tokens": False,
"append_concat_token": False,
},
)
All other things and parameters are same. LORA gave really good results, without-LORA produced non related response.
Anyone has experience and clues on the reasons?