I have a trained a QLoRa adapter for a language model using the huggingface peft library.
Given this adapter, is it possible now to create a new model of the same size as the base model representing the weights with the QLoRa perturbation applied to it? That is, I would like to project the QLoRa adapter down onto the original model space, so that I can simplify my inference script.
More details: During training QLoRa is extremely helpful, but during inference I would like to work with a single checkpoint representing the whole model, rather than with a checkpoint and an adapter. That way, I’ll be able to work with inference tools that expect a single checkpoint and do not know about QLoRa adapters.
So what I’m looking for is code to load (1) a base checkpoint, (2) a QLoRa adapter, and output a new checkpoint representing the model with the QLoRa perturbation applied. Is this possible?