Fine tuning using LOFTQ - CUDA out of memory error

smallsuper · February 16, 2024, 10:07pm

I want to fine tune a quantized Mistral on a 24 GB GPU. but I am getting CUDA out of memory error. It seems the base_model is not quantized.

from transformers import AutoModelForCausalLM
from peft import LoftQConfig, LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
loftq_config = LoftQConfig(loftq_bits=4)

lora_config = LoraConfig(
    init_lora_weights="loftq",
    loftq_config=loftq_config,
    r=16,
    lora_alpha=8,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

DenseLance · February 17, 2024, 4:56am

When using LoftQ as a finetuning method, you would need to initialize the model first. Quantization is only done under the hood by loftq_config. On the other hand if you were to use QLoRA, you can quantize the model directly as a parameter under from_pretrained before finetuning.

smallsuper · February 18, 2024, 4:40am

If we have to initialize the model before quantization, does this not defeat the purpose of quantization?

DenseLance · February 18, 2024, 6:23am

Yep, which is why LoftQConfig was a confusing addition. You are meant to apply the LoftQ technique to a full-precision pre-trained weight first, as seen here. From there on, you can quantize and save the model, so that in the future you would only need to load the quantized model.

My suggestion is to either use the already available “LoftQ-applied” models on HuggingFace, or stick to QLoRA.

system · February 18, 2024, 6:23pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fine Tuning LLama 3.2 1B Quantized Memory Requirements Models	6	1448	June 16, 2025
Fine tuning for Llama2 based model with LoftQ quantization 🤗Transformers	7	2370	January 24, 2024
While fine tuning the quantize model of sarvamai/OpenHathi-7B-Hi-v0.1-Base model getting memory error 🤗Transformers	0	230	January 29, 2024
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1755	October 8, 2023
LoRA / QLoRA fine tuning a 8b Model(llama 3.1) Beginners	1	305	February 24, 2025

Fine tuning using LOFTQ - CUDA out of memory error

Related topics