Bitsandbytes quantization and QLORA fine-tuning

dzoni99 · October 11, 2024, 10:46am

Hello friends!

I want to fine tune a quantized RoBERTa base model using the QLORA approach. Below is the configuration.

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.bfloat16,
llm_int8_skip_modules=[“classifier”]
)

model = AutoModelForSequenceClassification.from_pretrained(
“roberta-base”,
num_labels=2,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map=“auto”,
)

What I’m not sure I understand, when I look at the datatypes for the LORA matrices, they are in float 32 format. Also, after executing the function prepare_model_for_kbit_training, the other parts of the layers, all except the weights, are converted to float 32 (bias, layernorm…). Do they and the LORA matrices have to be in 32b format, or can they somehow be converted to 16b format? When combining LORA matrices and model weights matrices, LORA matrices are converted to bfloat16 or everything is converted to float 32? Is the quantization potential used if some layers remain in 32b format?

lilylii · November 5, 2024, 9:18pm

Do you have an answer/clarification for this? I have similar confusion - I am fine-tuning Llama 3.1 with QLoRA and am unable to load the model to have tensor.dtype=torch.bfloat16

Topic		Replies	Views
Understanding how changing bnb_4bit_compute_dtype affects outputs 🤗Transformers	1	4692	February 10, 2024
Fine tuning for Llama2 based model with LoftQ quantization 🤗Transformers	7	2371	January 24, 2024
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1757	October 8, 2023
How to generate using a fine-tuned qlora cast to bfloat16 Beginners	1	1203	April 6, 2024
Qlora - 8 bit quantization using bitsandbytes gives error for owl-vit model Intermediate	1	494	April 12, 2024

Bitsandbytes quantization and QLORA fine-tuning

Related topics