I am currently working on a project where I want to finetune a LLM2VEC model with QLora. The ‘McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised’ consists of two Lora adapters which to the documentation should be loaded in the following way:
config = AutoConfig.from_pretrained(...)
model = AutoModel.from_pretrained(...)
model = PeftModel.from_pretrained(
model, first_adapter,
)
model = model.merge_and_unload()
model = PeftModel.from_pretrained(
model, second_adapter
)
I now want to fine-tune the LLM2Vec model on a downstream task using QLora. The usual approach for using QLora according to the Huggingface documentation is loading the model from_pretrained with a 4bit bnb config LINK. Merging the adapter into the quantized model leads to rounding errors, decreasing model performance. Thus I am looking for a way to add the quantization after merging the adapters into the main model. Thanks for the help in advance.
What I tried so far:
- save_pretrained with the two merge_and_unload adapters and loading it from_pretrained with the quantization.
- I get the error “UnboundLocalError: cannot access local variable ‘active_adapters’ where it is not associated with a value” while saving the model with save_pretrained(). Not sure if this is a bug or intended behaviour.
#...previous code
model = model.merge_and_unload()
model.save_pretrained(...)
# UnboundLocalError: cannot access local variable 'active_adapters' where it is not associated with a value
bnb_config = BitsAndBytesConfig()
quantized_model = AutoModel.from_pretrained(... + bnb_config)
- save_pretrained after adding an empty Lora adapter for finetuning and then loading from_pretrained with quantization
# ... previous code
model = model.merge_and_unload()
peft_config = LoraConfig(...)
model = get_peft_model(model, peft_config)
model.save_pretrained()
#load model + quantization
bnb_config = BitsAndBytesConfig()
quantized_model = AutoModel.from_pretrained(... + bnb_config)
quantized_peft_model = PeftModel.from_pretrained(
quantized_model,
empty_adapter,
)
- This seams to work. I am not sure though if this is the intended way to do it and why I cant save the model without an adapter.
Here the model specification of the merged_and_unloaded model:
MistralEncoderModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x ModifiedMistralDecoderLayer(
(self_attn): ModifiedMistralSdpaAttention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=1024, bias=False)
(v_proj): Linear(in_features=4096, out_features=1024, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)