After llama fine tuning, model merging fails

kbuwel · May 19, 2025, 7:05pm

The error message is as follows:
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.base_layer.weight: copying a param with shape torch.Size([
128257, 3072]) from checkpoint, the shape in current model is torch.Size([128256, 3072]).
size mismatch for base_model.model.model.embed_tokens.lora_embedding_A.default: copying a param with shape torch
.Size([64, 128257]) from checkpoint, the shape in current model is torch.Size([64, 128256]).
size mismatch for base_model.model.lm_head.base_layer.weight: copying a param with shape torch.Size([128257, 307
2]) from checkpoint, the shape in current model is torch.Size([128256, 3072]).
size mismatch for base_model.model.lm_head.lora_B.default.weight: copying a param with shape torch.Size([128257,
64]) from checkpoint, the shape in current model is torch.Size([128256, 64]).
Here is the code I learned:
import os
import json
import random
import torch
import argparse
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from datasets import Dataset
from peft import get_peft_model, LoraConfig
from torch.optim import AdamW
from accelerate import Accelerator

명령줄 인자 설정

parser = argparse.ArgumentParser()
parser.add_argument(“–batch_size”, type=int, default=2)
parser.add_argument(“–micro_batch_size”, type=int, default=5) # 마이크로 배치 사이즈 추가
parser.add_argument(“–lr”, type=float, default=5e-5)
args = parser.parse_args()

모델 ID 설정

model_id = ‘llama-3.2-Korean-Bllossom-3B’

config.json 파일 로드

config_file = f"{model_id}/config.json"
if not os.path.exists(config_file):
raise FileNotFoundError(f"Config file not found: {config_file}")
with open(config_file, ‘r’) as f:
config = json.load(f)

max_position_embeddings 값을 가져옴, 기본값은 512로 설정

max_position_embeddings = config.get(‘max_position_embeddings’, 512)

토크나이저 로드

tokenizer = AutoTokenizer.from_pretrained(model_id)

패딩 토큰 추가

tokenizer.add_special_tokens({‘pad_token’: ‘[PAD]’})
pad_token_id = tokenizer.pad_token_id

LoRA 설정

lora_config = LoraConfig(
r=64,
lora_alpha=16,
lora_dropout=0.1,
task_type=“CAUSAL_LM”,
target_modules=[“q_proj”, “v_proj”, “embed_tokens”, “lm_head”]
)

모델을 CPU로 로드

model = AutoModelForCausalLM.from_pretrained(model_id)
model.resize_token_embeddings(len(tokenizer))
model = get_peft_model(model, lora_config)

pad_token_id 명시적으로 설정

model.generation_config.pad_token_id = pad_token_id

Accelerator 초기화

accelerator = Accelerator()

모델을 Accelerator로 준비

model, tokenizer = accelerator.prepare(model, tokenizer)

processed_dataset.json 파일 로드

dataset_file = ‘processed_dataset.json’
if not os.path.exists(dataset_file):
raise FileNotFoundError(f"Dataset file not found: {dataset_file}")
with open(dataset_file, ‘r’, encoding=‘utf-8’) as f:
full_dataset = json.load(f)

데이터가 리스트 형태일 경우

texts = [item[‘text’] for item in full_dataset]
random.shuffle(texts)

데이터 나누기

split_index = int(len(texts) * 0.8)
train_texts = texts[:split_index]
val_texts = texts[split_index:]

데이터셋 생성

train_dataset = Dataset.from_dict({“text”: train_texts})
val_dataset = Dataset.from_dict({“text”: val_texts})

필요 없는 문자열 제거 함수

def remove_unwanted_strings(examples):
examples[‘text’] = [text.replace(‘<>’, ‘’).replace(‘<>’, ‘’).strip() for text in examples[‘text’]]
return examples

문자열 제거 적용

train_dataset = train_dataset.map(remove_unwanted_strings, batched=True)
val_dataset = val_dataset.map(remove_unwanted_strings, batched=True)

데이터 전처리 함수

def preprocess_function(examples):
tokenized = tokenizer(
examples[“text”],
max_length=512,
truncation=True,
padding=“max_length”, # max_length로 패딩
return_tensors=‘pt’
)

labels = tokenized["input_ids"].clone()
labels[labels == pad_token_id] = -100  # 패딩 토큰을 -100으로 설정하여 손실에서 제외
tokenized["labels"] = labels

return tokenized

데이터셋 전처리

tokenized_train_dataset = train_dataset.map(preprocess_function, batched=True)
tokenized_val_dataset = val_dataset.map(preprocess_function, batched=True)

TrainingArguments 설정

training_args = TrainingArguments(
output_dir=‘./results’,
per_device_train_batch_size=args.batch_size,
num_train_epochs=5,
learning_rate=args.lr,
logging_dir=‘./logs’,
logging_steps=10,
eval_strategy=“epoch”, # 평가 전략을 'epoch’으로 설정
save_strategy=“epoch”, # 저장 전략도 'epoch’으로 설정
report_to=“wandb”,
logging_first_step=True,
bf16=True, # FP16 활성화
gradient_accumulation_steps=3,
save_total_limit=3,
load_best_model_at_end=True,
metric_for_best_model=“eval_loss”,
)

Optimizer 설정

optimizer = AdamW(model.parameters(), lr=training_args.learning_rate)

Trainer 설정

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_val_dataset,
optimizers=(optimizer, None), # Custom optimizer 추가
)

Gradient Clipping 설정

trainer.args.max_grad_norm = 1.0 # Gradient Clipping 추가

학습 시작

trainer.train()

학습 후 모델과 토크나이저 저장

model.save_pretrained(‘./results’)
tokenizer.save_pretrained(‘./results’)

추론 함수 정의

def infer(text):
with torch.no_grad():
inputs = tokenizer(text, return_tensors=‘pt’, max_length=128, truncation=True)
inputs = {k: v.to(accelerator.device) for k, v in inputs.items()}
outputs = model.generate(**inputs)
return tokenizer.decode(outputs[0], skip_special_tokens=True)

학습 중간에 메모리 정리

torch.cuda.empty_cache()
The above code has undergone many modifications. It was written with gpt3, ms copilot, and advice from many people.
Fine tuning was successful. The merge code used merge_peft_adapters.py from github.
The command I entered was:
python merge_peft_adapters.py --base_model_name_or_path llama-3.2-Korean-Bllossom-3B --peft_model_path results/checkpoint-65 --device cpu
I used cpu because it was a small model, but the merge failed.

John6666 · May 20, 2025, 6:22am

It seems like this error can probably be resolved, but it appears that you may need to change the Embedding of the token in some cases…

That being said, the specifications for merge-related functions sometimes change, so existing scripts may not work properly.
If you’re only using it for yourself, it might be quicker to rewrite it, as you only need to load and save it.

github.com/huggingface/peft

Error while loading PEFT lora model

opened 08:36PM - 30 Apr 24 UTC

closed 09:27AM - 02 May 24 UTC

Zuhashaik

Trained model using this lora config: ``` model.resize_token_embeddings(len(to…kenizer)) Embedding(33004, 4096) lora_alpha = 2048 lora_dropout = 0.3 lora_r = 1024 target_modules=[ # "up_proj", "o_proj", "v_proj", "gate_proj", "q_proj", # "down_proj", "k_proj" ] modules_to_save = ["lm_head", "embed_tokens"] peft_config = LoraConfig( lora_alpha=lora_alpha, lora_dropout=lora_dropout, r=lora_r, target_modules = target_modules, bias="none", modules_to_save = modules_to_save, task_type="CAUSAL_LM") model = prepare_model_for_kbit_training(model) model = get_peft_model(model, peft_config) model.print_trainable_parameters() trainable params: 1,839,038,464 || all params: 8,585,678,848 || trainable%: 21.419837575550556 trainer = Trainer( model=model, tokenizer=tokenizer, args=training_args, train_dataset=sample_train, eval_dataset=sample_val, ) trainer.train() ``` Where i wnated to keep my word embedding layer trainable.. Training went smooothly no problem, but while loading the checkpoints using AutoModelForCausalLM and PeftModel gettitng the same error: ``` from peft import LoraConfig from transformers import AutoModelForCausalLM from peft import PeftModel import torch repo_name = '/media/iiit/Karvalo/zuhair/Proj-multimodal/libri100-english-transcribe-1024,2048,0.2,kqvog-hubert_proj_grouped-wte/checkpoint-1' config = LoraConfig.from_pretrained(repo_name) model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, ) # Load the LoRA model inference_model = PeftModel.from_pretrained(model, repo_name) ``` ``` Loading checkpoint shards: 100% 2/2 [00:08<00:00, 3.89s/it] --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[3], line 18 11 model = AutoModelForCausalLM.from_pretrained( 12 config.base_model_name_or_path, 13 device_map="auto", 14 torch_dtype=torch.bfloat16, 15 trust_remote_code=True, 16 ) 17 # Load the LoRA model ---> 18 inference_model = PeftModel.from_pretrained(model, repo_name) File ~/anaconda3/lib/python3.9/site-packages/peft/peft_model.py:356, in PeftModel.from_pretrained(cls, model, model_id, adapter_name, is_trainable, config, **kwargs) 354 else: 355 model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config, adapter_name) --> 356 model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) 357 return model File ~/anaconda3/lib/python3.9/site-packages/peft/peft_model.py:730, in PeftModel.load_adapter(self, model_id, adapter_name, is_trainable, **kwargs) 727 adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs) 729 # load the weights into the model --> 730 load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name) 731 if ( 732 (getattr(self, "hf_device_map", None) is not None) 733 and (len(set(self.hf_device_map.values()).intersection({"cpu", "disk"})) > 0) 734 and len(self.peft_config) == 1 735 ): 736 device_map = kwargs.get("device_map", "auto") File ~/anaconda3/lib/python3.9/site-packages/peft/utils/save_and_load.py:249, in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name) 246 else: 247 raise NotImplementedError --> 249 load_result = model.load_state_dict(peft_model_state_dict, strict=False) 250 if config.is_prompt_learning: 251 model.prompt_encoder[adapter_name].embedding.load_state_dict( 252 {"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True 253 ) File ~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:2189, in Module.load_state_dict(self, state_dict, strict, assign) 2184 error_msgs.insert( 2185 0, 'Missing key(s) in state_dict: {}. '.format( 2186 ', '.join(f'"{k}"' for k in missing_keys))) 2188 if len(error_msgs) > 0: -> 2189 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2190 self.__class__.__name__, "\n\t".join(error_msgs))) 2191 return _IncompatibleKeys(missing_keys, unexpected_keys) RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([33004, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([33004, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). ``` And also my checkpoints are very huge in size there is a problem please help me to clear out **check the image of my checkpoint :** ![Screenshot from 2024-05-01 02-04-02](https://github.com/huggingface/peft/assets/104340147/0f8d2c0b-ffd6-4566-b530-a7167a4ea56c)

github.com/huggingface/peft

Size Mismatch Error When Loading Pretrained Model with Expanded Embedding Layer

opened 01:47PM - 28 Jul 23 UTC

closed 03:03PM - 29 Oct 23 UTC

hzphzp

### System Info peft, transformers ### Who can help? _No response_ ### Infor…mation - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [x] An officially supported task in the `examples` folder - [ ] My own task or dataset (give details below) ### Reproduction I am currently facing an issue when trying to load a pretrained model with an expanded embedding layer. I first expanded the embedding layer, wrapped the model using LoRA, and pass `modules_to_save=["embed_tokens", "lm_head"]` to make the embedding layer trainable. After training, I saved the model using `save_pretrained(path1)`. However, when I try to load the model using AutoModelForCausalLM.from_pretrained(path1), I encounter the following error: ```python Error(s) in loading state_dict for ModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([33049, 16]) from checkpoint, the shape in current model is torch.Size([32000, 16]) ``` ### Expected behavior I have a rough idea of where the problem lies, but I'm wondering if it would be better to not perform size match checking when loading pretrained models. This could potentially avoid many unnecessary issues.

github.com/huggingface/autotrain-advanced

[BUG]size mismatch for base_model.model.model.embed_tokens.weight

opened 06:31AM - 03 Feb 24 UTC

closed 02:25AM - 08 Feb 24 UTC

ruanwz

bug

### Prerequisites - [X] I have read the [documentation](https://hf.co/docs/auto…train). - [X] I have checked other issues for similar problems. ### Backend Colab ### Interface Used CLI ### CLI Command !autotrain llm \ --train \ --model ${MODEL_NAME} \ --project-name ${PROJECT_NAME} \ --data-path data/ \ --text-column text \ --lr ${LEARNING_RATE} \ --batch-size ${BATCH_SIZE} \ --epochs ${NUM_EPOCHS} \ --block-size ${BLOCK_SIZE} \ --warmup-ratio ${WARMUP_RATIO} \ --lora-r ${LORA_R} \ --lora-alpha ${LORA_ALPHA} \ --lora-dropout ${LORA_DROPOUT} \ --weight-decay ${WEIGHT_DECAY} \ --gradient-accumulation ${GRADIENT_ACCUMULATION} \ --quantization ${QUANTIZATION} \ --mixed-precision ${MIXED_PRECISION} \ $( [[ "$MERGE_ADAPTER" == "True" ]] && echo "--merge_adapter" ) \ $( [[ "$PEFT" == "True" ]] && echo "--peft" ) \ $( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}" ) ### UI Screenshots & Parameters ![image](https://github.com/huggingface/autotrain-advanced/assets/4874/81e42bb4-e0e1-4488-8e30-ba1d23f45924) ![image](https://github.com/huggingface/autotrain-advanced/assets/4874/de6c6564-ae3b-46eb-ae86-adb873437eca) ### Error Logs {'train_runtime': 138.1913, 'train_samples_per_second': 1.65, 'train_steps_per_second': 0.203, 'train_loss': 1.3923424993242537, 'epoch': 0.98} 100% 28/28 [02:18<00:00, 4.94s/it] 🚀 INFO | 2024-02-03 05:42:13 | __main__:train:477 - Finished training, saving model... 🚀 INFO | 2024-02-03 05:42:16 | __main__:train:488 - Merging adapter weights... 🚀 INFO | 2024-02-03 05:42:16 | autotrain.trainers.clm.utils:merge_adapter:192 - Loading adapter... Loading checkpoint shards: 100% 10/10 [00:30<00:00, 3.10s/it] /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( ⚠️ WARNING | 2024-02-03 05:42:48 | __main__:train:500 - Failed to merge adapter weights: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). ⚠️ WARNING | 2024-02-03 05:42:48 | __main__:train:501 - Skipping adapter merge. Only adapter weights will be saved. 🚀 INFO | 2024-02-03 05:42:49 | __main__:train:510 - Pushing model to hub... adapter_model.safetensors: 0% 0.00/1.08G [00:00<?, ?B/s] adapter_model.safetensors: 0% 0.00/1.08G [00:00<?, ?B/s] ### Additional Information When running the example colab notebook for autotrain LLM with --merge_adapter, it failed because size mismatch. 🚀 INFO | 2024-02-03 05:42:16 | __main__:train:488 - Merging adapter weights... 🚀 INFO | 2024-02-03 05:42:16 | autotrain.trainers.clm.utils:merge_adapter:192 - Loading adapter... Loading checkpoint shards: 100% 10/10 [00:30<00:00, 3.10s/it] /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( ⚠️ WARNING | 2024-02-03 05:42:48 | __main__:train:500 - Failed to merge adapter weights: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). ⚠️ WARNING | 2024-02-03 05:42:48 | __main__:train:501 - Skipping adapter merge. Only adapter weights will be saved.

Topic		Replies	Views
Loading Peft model from checkpoint leading into size missmatch 🤗Transformers	6	10388	February 7, 2024
Size mismatch error in PEFT fine tuned model 🤗Transformers	4	1482	July 2, 2024
Loading pre-trained models with AddedTokens 🤗Transformers	2	760	October 14, 2024
Having trouble loading a fine-tuned PEFT model (CodeLlama-13b-Instruct-hf base) 🤗Transformers	2	4320	October 6, 2024
Why my finetuned model size so small and unable to load Beginners	0	115	July 9, 2024

After llama fine tuning, model merging fails

명령줄 인자 설정

모델 ID 설정

config.json 파일 로드

max_position_embeddings 값을 가져옴, 기본값은 512로 설정

토크나이저 로드

패딩 토큰 추가

LoRA 설정

모델을 CPU로 로드

pad_token_id 명시적으로 설정

Accelerator 초기화

모델을 Accelerator로 준비

processed_dataset.json 파일 로드

데이터가 리스트 형태일 경우

데이터 나누기

데이터셋 생성

필요 없는 문자열 제거 함수

문자열 제거 적용

데이터 전처리 함수

데이터셋 전처리

TrainingArguments 설정

Optimizer 설정

Trainer 설정

Gradient Clipping 설정

학습 시작

학습 후 모델과 토크나이저 저장

추론 함수 정의

학습 중간에 메모리 정리

Related topics