LoRA vs QLoRA finetuning performance on llama2

rjtshrm · September 4, 2023, 6:55pm

I am finetuning llama2 uusing LoRA and QLoRA to see the differences in both. I first trained on loRA with special end token <|end|> so that the model knows when to stop. With loRA fintuning it works fine and model also predicts the <|end|> token. keeping the trainings configuration same apart form 4 bit quantization with QLoRA, I see the model cannot predict the <|end|>.

Also when I prepare the peft model, I do load the model using prepare_model_for_kbit_training and then do get_peft_model. Do I need to do prepare_model_for_kbit_training when I do 4 bit quantization in QLoRA. Becuase If I don’t do that then it CUDA OOM. Every thing is kept same like batch size and all other params for loRA and QLoRA.

What could be the reason for less accuracy with QLoRA. If I understood it decreases the less GPU utilizattion but does it affect the model performance.

Topic		Replies	Views
LoRA / QLoRA fine tuning a 8b Model(llama 3.1) Beginners	1	297	February 24, 2025
Training CodeLlama2 using LORA doesnt save any memory Beginners	0	701	November 23, 2023
`get_peft_model` or `model.add_adapter` Beginners	2	1168	February 17, 2025
Llama2 fine-tunning with PEFT QLora and testing the model 🤗Transformers	13	15232	December 21, 2023
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1748	October 8, 2023

LoRA vs QLoRA finetuning performance on llama2

Related topics