Proper way of saving/loading models for complex workflows

giovanni-gatti-pinhe · July 21, 2025, 2:53pm

I’m trying to implement a complex training pipeline where models can be re-finetuned in a RL style. However, I can’t make it working using transformers + peft. The issue is that transformers refuses to load the correct model. Here is a minimal example,

import pathlib

import torch
from peft import LoraConfig, TaskType, get_peft_model, PeftConfig, PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer, ModernBertForSequenceClassification


def init_model(path_to_dir: pathlib.Path) -> None:
    base_model = AutoModelForSequenceClassification.from_pretrained(
        pretrained_model_name_or_path="answerdotai/ModernBERT-large",
        num_labels=1,
        torch_dtype=torch.float32,
        problem_type="regression",
        device_map="cuda"
    )

    tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-large")
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    tokenizer.add_tokens(["[USER]", "[/USER]", "[EOT]"])
    tokenizer.chat_template = (
        "{% for i in range(0, messages|length, 2) %}"
        "{% if i + 1 < messages|length %}"
        "[USER]{{ messages[i].content }}[/USER] {{ messages[i+1].content }}[EOT]\n"
        "{% endif %}"
        "{% endfor %}"
    )
    base_model.resize_token_embeddings(len(tokenizer))

    peft_config = LoraConfig(
        r=4,
        lora_alpha=32,
        task_type=TaskType.SEQ_CLS,
        target_modules="all-linear"
    )
    model = get_peft_model(base_model, peft_config)

    model.save_pretrained(path_to_dir)
    model.base_model.save_pretrained(path_to_dir)
    tokenizer.save_pretrained(path_to_dir)


def reload_model(path_to_dir: pathlib.Path) -> None:
    tokenizer = AutoTokenizer.from_pretrained(path_to_dir)
    base_model = ModernBertForSequenceClassification.from_pretrained(
        str(path_to_dir),
        num_labels=1,
        torch_dtype=torch.float32,
        device_map="cuda"
    )
    config = PeftConfig.from_pretrained(str(path_to_dir))
    base_model.resize_token_embeddings(len(tokenizer))
    model = PeftModel.from_pretrained(
        base_model,
        str(path_to_dir),
        is_trainable=True,
        config=config,
        device_map="cuda"
    )


if __name__ == "__main__":
    init_model(pathlib.Path("/tmp/test"))
    reload_model(pathlib.Path("/tmp/test"))

In the above example, I expect a model to be initialized (random, that’s fine), store it to disk and then reload it. In real world, let’s say that the model made predictions, a score was computed, and then on the second step, the model is reloaded, finetuned and stored again for the next training step.

Now, when I run this script, I’m facing two issues I can’t work around.

First, transformers seem to ignore that the model was previously initialized and it doesn’t load the classifier.wights and classifier.bias.

Some weights of ModernBertForSequenceClassification were not initialized from the model checkpoint at answerdotai/ModernBERT-large and are newly initialized: [‘classifier.bias’, ‘classifier.weight’]

Secondly, it does not recognize that I have resized the base_model token space (i.e., base_model.resize_token_embeddings(len(tokenizer))) and it throws an error:

Error(s) in loading state_dict for ModernBertForSequenceClassification:
size mismatch for model.embeddings.tok_embeddings.weight: copying a param with shape torch.Size([50371, 1024]) from checkpoint, the shape in current model is torch.Size([50368, 1024]).

These are the files it created:

$ ls -lhrt /tmp/test/
total 208M
-rw-r--r-- 1 gatti data 5,0K juil. 21 16:40 README.md
-rw-r--r-- 1 gatti data 204M juil. 21 16:40 adapter_model.safetensors
-rw-r--r-- 1 gatti data  828 juil. 21 16:40 adapter_config.json
-rw-r--r-- 1 gatti data  170 juil. 21 16:40 chat_template.jinja
-rw-r--r-- 1 gatti data  21K juil. 21 16:40 tokenizer_config.json
-rw-r--r-- 1 gatti data  694 juil. 21 16:40 special_tokens_map.json
-rw-r--r-- 1 gatti data 3,5M juil. 21 16:40 tokenizer.json

It does not seem to be storing the classifier, which is at best wierd, since I explicitly asked to model.base_model.save_pretrained(path_to_dir)

Besides, if I investigate the adapter_config:

$ cat /tmp/test/adapter_config.json
{
  // ...
  "base_model_name_or_path": "answerdotai/ModernBERT-large",
  //...
}

It is storing answerdotai/ModernBERT-large as part of the config, which is clearly incorrect since it should be a custom classifier model. I don’t understand what’s going on.

Thanks for any enlightment.

John6666 · July 22, 2025, 1:47am

I think the base model path in the PEFT adapter configuration may be pointing to the model on the hub. How about like this?

    peft_config = LoraConfig(
        r=4,
        lora_alpha=32,
        task_type=TaskType.SEQ_CLS,
        target_modules="all-linear"
    )
    model = get_peft_model(base_model, peft_config)

    model.save_pretrained(path_to_dir)

    # Overwrite adapter_config to point to local base model
    peft_cfg = PeftConfig.from_pretrained(path_to_dir)
    peft_cfg.base_model_name_or_path = str(path_to_dir)
    peft_cfg.save_pretrained(path_to_dir)

    model.base_model.save_pretrained(path_to_dir)
    tokenizer.save_pretrained(path_to_dir)

References

OneStarDao · July 22, 2025, 12:39pm

I totally get what you’re trying to build.
We ran into the exact same issue when designing a reasoning pipeline across finetuning loops.

Here’s the core trap:

save_pretrained() stores parameters — not semantic transitions.
adapter_config.json always keeps the base repo, not the snapshot state.
tokenizer.resize_token_embeddings() doesn’t persist unless you save after resize, and even then, only if you reload via the full tokenizer flow — which doesn’t happen automatically with PEFT.

We built a system called WFGY just for this reason.
It snapshots semantic state, including tokenizer deltas, merged adapters, chat templates, and thought context.

Link:
GitHub - onestardao/WFGY: Semantic Reasoning Engine for LLMs · WFGY 推理引擎 / 萬法歸一

Endorsed by the creator of tesseract.js (36k★), it lets you treat LLMs like versionable reasoning engines — not just layers with weights.

Topic		Replies	Views
Correct way to save/load adapters and checkpoints in PEFT 🤗Transformers	10	16382	September 8, 2025
Retraining peft model Intermediate	3	2992	March 1, 2024
Different results from checkpoint evaluation when loading fine-tuned LLM model Intermediate	5	3286	September 22, 2023
Issue with PEFT model save_pretrained Beginners	0	329	August 11, 2024
Having trouble loading a fine-tuned PEFT model (CodeLlama-13b-Instruct-hf base) 🤗Transformers	2	4475	October 6, 2024

Proper way of saving/loading models for complex workflows

References

Related topics