Is there any actual performance improvement when using LoRA alone for SFT on the LLaMA 3.2 3B base model?

So typically ,

  1. LoRA capacity bottleneck on small models

  2. LoRA assumes a relatively “well-behaved” base model. If the base isn’t instruction-tuned or capable in your task domain, LoRA doesn’t get enough leverage to shift it into useful territory, especially without supervised signals.

Try increasing the batch size and r and decreasing lr. You could try an instruction model. You could warm up the base model with some “pre - training” … Also look into QLora, it is a bit different .

Hope this helps

2 Likes