Fine Tune with/without LORA

Fine tune a 8G model on chat data with/without LORA.

With LORA:

peft_config = LoraConfig(
        lora_alpha=128,
        lora_dropout=0.05,
        r=256,
        bias="none",
        target_modules="all-linear",
        task_type="CAUSAL_LM",
    )
trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=dataset,
        peft_config=peft_config,
        max_seq_length=max_seq_length,
        tokenizer=tokenizer,
        packing=True,
        dataset_kwargs={
            "add_special_tokens": False,
            "append_concat_token": False,
        },
    )

Without LORA:

    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=dataset,
        max_seq_length=max_seq_length,
        tokenizer=tokenizer,
        packing=True,
        dataset_kwargs={
            "add_special_tokens": False,
            "append_concat_token": False,
        },
    )

All other things and parameters are same. LORA gave really good results, without-LORA produced non related response.

Anyone has experience and clues on the reasons?

Isn’t it the expected result? LORA would be easier to train so if you are using same number of epochs LORA should learn faster I believe.