Fine tuning using LOFTQ - CUDA out of memory error

I want to fine tune a quantized Mistral on a 24 GB GPU. but I am getting CUDA out of memory error. It seems the base_model is not quantized.

from transformers import AutoModelForCausalLM
from peft import LoftQConfig, LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
loftq_config = LoftQConfig(loftq_bits=4)

lora_config = LoraConfig(
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],

model = get_peft_model(model, lora_config)

When using LoftQ as a finetuning method, you would need to initialize the model first. Quantization is only done under the hood by loftq_config. On the other hand if you were to use QLoRA, you can quantize the model directly as a parameter under from_pretrained before finetuning.

If we have to initialize the model before quantization, does this not defeat the purpose of quantization?

Yep, which is why LoftQConfig was a confusing addition. You are meant to apply the LoftQ technique to a full-precision pre-trained weight first, as seen here. From there on, you can quantize and save the model, so that in the future you would only need to load the quantized model.

My suggestion is to either use the already available “LoftQ-applied” models on HuggingFace, or stick to QLoRA.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.