Hello,
I have finetuned a model, with a PEFT configuration and I am trying to load them for inference.
I have read in this answer that loading the model that you have saved:
trainer.model.save_pretrained(new_model)
using the AutoModelForCausalLM
:
model = AutoModelForCausalLM.from_pretrained(
new_model,
quantization_config=bnb_config,
device_map=device_map
)
would load both the base_model + adapters.
However most of the tutorials suggest the use of merge_and_unload()
when working with peft models:
base_model = AutoModelForCausalLM.from_pretrained(
base_model,
low_cpu_mem_usage=True,
return_dict=True,
device_map=device_map,
quantization_config=bnb_config,
)
model = PeftModel.from_pretrained(model=base_model,
model_id=new_model,
quantization_config=bnb_config,
device_map=device_map,)
model = model.merge_and_unload(safe_merge=True)
As far as I understand these 2 blocks of code should yield the same results but when I print the model structures they are nothing alike. Moreover, they yield different results when used for inference - the results of the latter resemble the output of the base model, whereas the results of the former look like the training data that I used.