Loading quantised weights does not work

cemde · April 12, 2024, 4:56pm

I would like to fine-tune a large model (1.3B and 7B) using LoRA. I have to reduce the cost further than just using LoRA. A tutorial suggested to quantise the weights to 8 bit. Hence, my code is this:

bandb_config = BitsAndBytesConfig(load_in_4bit=True)
config = AutoConfig.from_pretrained(cfg.model.architecture, quantization_config=bandb_config)
model = AutoModelForCausalLM.from_pretrained(cfg.model.architecture, config=config, cache_dir=model_dir)

peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=cfg.training.lora.rank,
lora_alpha=cfg.training.lora.alpha,
lora_dropout=cfg.training.lora.dropout,
)

model = get_peft_model(model, peft_config)
print("Trainable Parameters:")
model.print_trainable_parameters()

trainer = Trainer(bf16=True, ...)
trainer.train()

However, it has no effect: Using 4 bit, 8 bit and no quantization require the same memory and time to train. Further, after defining model I checked the weights that were loaded. they are all fp32.

Topic		Replies	Views
How to do classification fine-tuning of quantized models? 🤗Transformers	0	479	February 2, 2024
"You cannot perform fine-tuning on purely quantized models." error in LoRA model training? 🤗Transformers	3	2597	August 16, 2024
Can I load a model fine-tuned with LoRA 4-bit quantization as an 8-bit model? 🤗Hub	0	289	November 27, 2023
Error. Model cannot be quantized if a LoRA adapter has been applied to it via merge_and_unload() Beginners	0	294	May 12, 2024
Size Mismatch when loading Lora Adapter for Phi3 Intermediate	0	218	July 30, 2024

Loading quantised weights does not work

Related topics