Iâm trying to implement a complex training pipeline where models can be re-finetuned in a RL style. However, I canât make it working using transformers
+ peft
. The issue is that transformers
refuses to load the correct model. Here is a minimal example,
import pathlib
import torch
from peft import LoraConfig, TaskType, get_peft_model, PeftConfig, PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer, ModernBertForSequenceClassification
def init_model(path_to_dir: pathlib.Path) -> None:
base_model = AutoModelForSequenceClassification.from_pretrained(
pretrained_model_name_or_path="answerdotai/ModernBERT-large",
num_labels=1,
torch_dtype=torch.float32,
problem_type="regression",
device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-large")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.add_tokens(["[USER]", "[/USER]", "[EOT]"])
tokenizer.chat_template = (
"{% for i in range(0, messages|length, 2) %}"
"{% if i + 1 < messages|length %}"
"[USER]{{ messages[i].content }}[/USER] {{ messages[i+1].content }}[EOT]\n"
"{% endif %}"
"{% endfor %}"
)
base_model.resize_token_embeddings(len(tokenizer))
peft_config = LoraConfig(
r=4,
lora_alpha=32,
task_type=TaskType.SEQ_CLS,
target_modules="all-linear"
)
model = get_peft_model(base_model, peft_config)
model.save_pretrained(path_to_dir)
model.base_model.save_pretrained(path_to_dir)
tokenizer.save_pretrained(path_to_dir)
def reload_model(path_to_dir: pathlib.Path) -> None:
tokenizer = AutoTokenizer.from_pretrained(path_to_dir)
base_model = ModernBertForSequenceClassification.from_pretrained(
str(path_to_dir),
num_labels=1,
torch_dtype=torch.float32,
device_map="cuda"
)
config = PeftConfig.from_pretrained(str(path_to_dir))
base_model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(
base_model,
str(path_to_dir),
is_trainable=True,
config=config,
device_map="cuda"
)
if __name__ == "__main__":
init_model(pathlib.Path("/tmp/test"))
reload_model(pathlib.Path("/tmp/test"))
In the above example, I expect a model to be initialized (random, thatâs fine), store it to disk and then reload it. In real world, letâs say that the model made predictions, a score was computed, and then on the second step, the model is reloaded, finetuned and stored again for the next training step.
Now, when I run this script, Iâm facing two issues I canât work around.
First, transformers seem to ignore that the model was previously initialized and it doesnât load the classifier.wights
and classifier.bias
.
Some weights of ModernBertForSequenceClassification were not initialized from the model checkpoint at answerdotai/ModernBERT-large and are newly initialized: [âclassifier.biasâ, âclassifier.weightâ]
Secondly, it does not recognize that I have resized the base_model token space (i.e., base_model.resize_token_embeddings(len(tokenizer))
) and it throws an error:
Error(s) in loading state_dict for ModernBertForSequenceClassification:
size mismatch for model.embeddings.tok_embeddings.weight: copying a param with shape torch.Size([50371, 1024]) from checkpoint, the shape in current model is torch.Size([50368, 1024]).
These are the files it created:
$ ls -lhrt /tmp/test/
total 208M
-rw-r--r-- 1 gatti data 5,0K juil. 21 16:40 README.md
-rw-r--r-- 1 gatti data 204M juil. 21 16:40 adapter_model.safetensors
-rw-r--r-- 1 gatti data 828 juil. 21 16:40 adapter_config.json
-rw-r--r-- 1 gatti data 170 juil. 21 16:40 chat_template.jinja
-rw-r--r-- 1 gatti data 21K juil. 21 16:40 tokenizer_config.json
-rw-r--r-- 1 gatti data 694 juil. 21 16:40 special_tokens_map.json
-rw-r--r-- 1 gatti data 3,5M juil. 21 16:40 tokenizer.json
It does not seem to be storing the classifier
, which is at best wierd, since I explicitly asked to model.base_model.save_pretrained(path_to_dir)
Besides, if I investigate the adapter_config:
$ cat /tmp/test/adapter_config.json
{
// ...
"base_model_name_or_path": "answerdotai/ModernBERT-large",
//...
}
It is storing answerdotai/ModernBERT-large
as part of the config, which is clearly incorrect since it should be a custom classifier model. I donât understand whatâs going on.
Thanks for any enlightment.