Correct way to save/load adapters and checkpoints in PEFT

Hi,

It is not clear to me what is the correct way to save/load a PEFT checkpoint, as well as the final fine-tuned model. There have been reports of trainer.resume_from_checkpoint not working as expected [1][2][3], each of which have very few replies, or do not seem to have any sort of consensus. Proposed solutions range from trainer.save_model, to trainer.save_state to resume_from_checkpoint=True to model.save_pretrained (PEFT docs) to even a very complicated procedure of merging and saving the model [4].

It is very confusing trying to figure out the correct solution between these, especially if resume_from_checkpoint can be buggy. Loading/saving models should really not be this confusing, so can we resolve once and for all what is the officially recommended (+tested) way of saving/loading adapters, as well as individual checkpoints during training? Can we update the HF docs accordingly, and simplify this process?

@sgugger

3 Likes

Hi @remorax98
unfortunately sylvian is no longer a part of :hugs: (just wanted to clarify this since there are lots of people who kept tagging him)

also for loading your PEFT model for continuous training there is a very easy parameter around this it’s called is_trainable which should allow you to load your PEFT model in a trainable state, then you can continue your training easily.
how to use
for my repo in not-lain/Gemma-2b-Peft-finetuning all i have to do is

# most of this code is from the button at the top right corner on 🤗
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("not-lain/Gemma-2b-Peft-finetuning")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
model = PeftModel.from_pretrained(model, 
"not-lain/Gemma-2b-Peft-finetuning",
 is_trainable=True # 👈 here
)
# check if it's working
model.print_trainable_parameters()
# >>> trainable params: 9,805,824 || all params: 2,515,978,240 || trainable%: 0.3897420034920493

heart react this comment if this solved the problem for you :hugs:

Hi @not-lain thanks for letting me know about Sylvain! Sorry, my bad.

Regarding this parameter, this is good to know and quite useful for me. But it does not answer my specific question - I am more concerned with the proper way for saving adapters and checkpoints in HF, and the lack of clarity in documentation regarding the same.

Thanks anyway!

don’t mention it @remorax98.
I also made this notebook for you explaining all steps from the initial training then recalling the model and continue training, it took me a lot of time to get it done, but hope this helps you out.

if this notebook helped you clarify how to use PEFT, please consider marking this conversation as solved :hugs:

regards,
hafedh hichri

Guys can you help with proper examples of storing it to the file system instead of pushing to service? I tried
trainer.model.save_pretrained("shake_adapter")
and then

shmodel = PeftModel.from_pretrained(
    base_model,
    "shake_adapter",
    is_trainable=True
)

but it does not update my base model at all

@adiudiun the way PEFT works is it uses adapters to train the model
the image below explains how you can visualize lora.

I think what you’re looking for is the save_embedding_layers, could you try setting it manually to True ?

image