Loading and saving a model

I’m trying to fine-tune a model over several days because I have time limitations. So a few epochs one day, a few epochs the next, etc. However, every time I try to load the adapter config file resulting from the previous training session, the model that loads is the base model, as if no fine-tuning had occurred! I’m not sure what is happening. Does anyone have any advice on how to change this? Is it a result of my saving strategy and using patience?

My training arguments are as follows:

#Load local model
adapter_path= "./model"
model= AutoModelForCausalLM.from_pretrained(
    adapter_path,
    quantization_config= quantization_config,
    device_map={"": 0}, token= huggingface_token
)
model.config.use_cache = False
model.config.pretraining_tp = 1

training_params = TrainingArguments(
    evaluation_strategy= "epoch",
    save_strategy= "epoch",
    logging_strategy= "epoch",
    num_train_epochs=3,
    output_dir="./newresults",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=80,
    optim="adamw_torch",
    learning_rate=1e-5,
    weight_decay=0.002,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to= "tensorboard",
    load_best_model_at_end= True,
    metric_for_best_model= "eval_loss",
    greater_is_better= False
)
trainer= SFTTrainer(
    model=model,
    data_collator=data_collator,
    train_dataset=new_dataset,
    peft_config=peft_params,
    dataset_text_field= "input",
    tokenizer= tokenizer,
    args=training_params,
    eval_dataset= valid_set,
    packing= False,
    callbacks= [callback]
)

trainer.train()
trainer.save_model("./model")
)```
This should save the model at every epoch in my local ./newresults directory, and it should save the final fine-tuned model in ./model. But when I try to load from either directory for the next round of training, the model that is loaded is the base model, not the fine-tuned one. Why might be the reason? Also, is there a way to distinguish which model is loaded before training? Right now I can tell because the re-loaded model's loss after an epoch of training is exactly what it was after the very first round of training from the base model.
3 Likes

So I think I found a solution to this, but if anyone has more info on this topic please lmk! After the first training epochs, save the fine-tuned model. Then re-load the base model in some variable, then use the merge_and_unload() command to merge the fine-tuned model and the base-model. Then save the merged model. The saved merged model will be the size of the base model, with the fine-tuned layers incorporated. To further fine-tune that model, load the merged model and pass fine-tune that merged model as if it were the base model.

python
# Save fine-tuned model
trainer_filepath= f"trainer/llama7b/{train_util.get_time()}"
trainer.model.save_pretrained(trainer_filepath)

# reload base model
base_model= AutoModelForCausalLM.from_pretrained(model_name, token= huggingface_token)

# merge base model and fine-tuned model
merged_model= PeftModel.from_pretrained(base_model, trainer_filepath)
merged_model= merged_model.merge_and_unload()

# save merged model
merged_model_path= f"model/llama7b/merged_{train_util.get_time()}"
merged_model.save_pretrained(merged_model_path)

If there’s a better solution to this problem, please lmk!

There are 2 things. The base model and the adapter. You can save adapter using
model.save_pretrained("adapter")
In this adapter folder, you will find the base model in the config file. Note that if you make changes in base model like changing vocab size, you will need to save that and update the path to base model in the adapter config.
Then you can load adapters(with base model) with

from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained(
        "adapter")

Further, if you don’t want adapter to be separate, you can merge it with base model itself with
model.merge_and_unload()
or save directly
model.save_pretrained_merged("merged_model", tokenizer)