Projecting a QLoRa adapter to original model space

alexflint · July 13, 2023, 2:16pm

I have a trained a QLoRa adapter for a language model using the huggingface peft library.

Given this adapter, is it possible now to create a new model of the same size as the base model representing the weights with the QLoRa perturbation applied to it? That is, I would like to project the QLoRa adapter down onto the original model space, so that I can simplify my inference script.

More details: During training QLoRa is extremely helpful, but during inference I would like to work with a single checkpoint representing the whole model, rather than with a checkpoint and an adapter. That way, I’ll be able to work with inference tools that expect a single checkpoint and do not know about QLoRa adapters.

So what I’m looking for is code to load (1) a base checkpoint, (2) a QLoRa adapter, and output a new checkpoint representing the model with the QLoRa perturbation applied. Is this possible?

Topic		Replies	Views
Inference after QLoRA fine-tuning Intermediate	8	6211	June 7, 2024
LLM2VEC QLora Quantization after merge_and_upload() Beginners	0	129	July 25, 2024
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17486	October 12, 2023
`get_peft_model` or `model.add_adapter` Beginners	2	1155	February 17, 2025
Loaded adapter seems ignored Beginners	0	186	May 24, 2024

Projecting a QLoRa adapter to original model space

Related topics