I am trying to run multiple rounds of fine tuning on llama2-7b-chat. I completed the first round of fine tuning and got adapter weights (let’s say I named the model name/llama-2-7b-chat-guanaco). I’m starting to get confused during the second round of fine tuning, in which I want to fine-tune on top of the already-fine-tuned model. I assumed that I’d be able to load the fine-tuned model like so and run the DPO pipeline:
model = AutoModelForCausalLM.from_pretrained("name/llama-2-7b-chat-guanaco", quantization_config=quant_config, device_map={"": 0})
model = PeftModel.from_pretrained(model, "name/llama-2-7b-chat-guanaco")
I trained and was able to push this new model to huggingface (let’s call it name/llama-2-7b-chat-guanaco-dpo). Oddly, the safetensors file is twice as large as the first pass, so I would appreciate some clarity on whether this is expected. But now I’m trying to serve that model for inference. Do I need to load the models in order to correctly sequence the adapters? Something like this:
first_finetune_path = "name/llama-2-7b-chat-guanaco"
config = PeftConfig.from_pretrained(first_finetune_path)
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
load_in_8bit=True,
device_map="auto"
)
self.model = PeftModel.from_pretrained(base_model, first_finetune_path)
# Path to the second fine-tuned model (DPO adapters)
second_finetune_path = "name/llama-2-7b-chat-guanaco-dpo"
# Load the second fine-tuned adapters (DPO)
self.model = PeftModel.from_pretrained(self.model, second_finetune_path)
Does it not need to be in sequence? I would appreciate any guidance on how to think about adapters. Thanks!
Oddly, the safetensors file is twice as large as the first pass
Is the model saved in fp32 precision? If so, I think the size will be doubled.
Does it not need to be in sequence? I would appreciate any guidance on how to think about adapters.
While I can’t say for sure that it’s completely unrelated, in general, changing the order in which you apply the adapters rarely causes any problems.
Whether you call it an adapter or LoRA, PEFT is in charge of that part in Hugging Face, so you can understand the whole story by reading PEFT-related literature.
By the way, there are cases where errors occur when applying LoRA in the model quantized state, so if you have any problems applying LoRA in the model quantized state, try applying it after first de-quantizing it, and then quantizing it again.