Is there any actual performance improvement when using LoRA alone for SFT on the LLaMA 3.2 3B base model?

Mdrnfox · June 19, 2025, 2:55am

CHOJW1004:

I’m currently running tests on a relatively small 3B model, and when I perform SFT using only LoRA from the start, the model doesn’t seem to train properly. I used 1 million training samples, but the output sentences are strange, and near the end of training, the model just repeats nonsensical words. In contrast, when I run full fine-tuning with mixed precision on the same dataset, the output improves over time, and I can clearly see performance gains on benchmarks.

with LoRA-only SFT, the loss doesn’t drop below 1.1, the outputs remain odd, and there’s no improvement in benchmark results.

Most of the online resources I found suggest that starting with LoRA-based SFT should work fine, even from the base model. Has anyone experienced a similar issue and found a solution?

For reference, I’m using Unsloth and the recommended hyperparameters.

So typically ,

LoRA capacity bottleneck on small models
LoRA assumes a relatively “well-behaved” base model. If the base isn’t instruction-tuned or capable in your task domain, LoRA doesn’t get enough leverage to shift it into useful territory, especially without supervised signals.

Try increasing the batch size and r and decreasing lr. You could try an instruction model. You could warm up the base model with some “pre - training” … Also look into QLora, it is a bit different .

Hope this helps

Topic		Replies	Views
Fine Tune with/without LORA 🤗Transformers	1	217	October 7, 2024
Training loop for LoRA 🤗Transformers	3	252	September 18, 2024
Fine tuning a LLaMa 3 with QLora - metrics calculation Beginners	1	876	October 17, 2024
Training CodeLlama2 using LORA doesnt save any memory Beginners	0	698	November 23, 2023
LoRA vs QLoRA finetuning performance on llama2 🤗Transformers	0	2814	September 4, 2023

Is there any actual performance improvement when using LoRA alone for SFT on the LLaMA 3.2 3B base model?

Related topics