Size Mismatch error for LLM checkpoint of PEFT model with a resized token embeddings

pranil51 · August 27, 2024, 6:57am

I have started training the Llama 3.1 8b model using Unsloth. I made some changes in the code as I am training a new language data, i.e. added tokens to the tokenizer and resized token embeddings of the model. When I am loading the checkpoint with the transformer’s AutoModelForCausalLM, it gives me a size mismatch error. Can anyone explain this?

mertoguzhan · January 10, 2025, 7:16am

Did you find any solution?

mertoguzhan · January 10, 2025, 8:01am

I was using AutoModelForCausalLM for load the fine tuned model. I don’t why but when I use AutoPeftModelForCausalLM instead of AutoModelForCausalLM, it worked. During training era, I opimized the model with peft by the way.

Alanturner2 · January 10, 2025, 8:11am

Hi, @pranil51

The size mismatch error is likely because the tokenizer and model are out of sync due to the addition of new tokens. When you add tokens to the tokenizer, you need to adjust the model’s embedding layer to match the updated tokenizer size. Here’s a step-by-step guide to resolve the issue:

Add Tokens to the Tokenizer:
Ensure you have added the new tokens to the tokenizer correctly:
```
tokenizer.add_tokens(new_tokens)
```
Resize the Token Embeddings:
After modifying the tokenizer, you need to resize the token embeddings in the model to accommodate the new vocabulary size:
```
model.resize_token_embeddings(len(tokenizer))
```
Checkpoint Loading:
When loading the checkpoint, the error occurs if the model’s checkpoint embeddings don’t match the resized embeddings. If you want to continue training from a checkpoint, ensure you:
- Save the resized model (after resizing embeddings) to update the checkpoint.
- Load the checkpoint only after resizing the embeddings:
```
model.resize_token_embeddings(len(tokenizer))
model.load_state_dict(torch.load(checkpoint_path), strict=False)
```
Setting strict=False ensures the new embeddings are initialized randomly, avoiding size mismatch errors.
Initialize New Embeddings:
Newly added embeddings are initialized randomly. For better results, you can manually initialize them based on pre-trained embeddings (e.g., averaging existing embeddings).
Verify Consistency:
Double-check that the tokenizer and model are saved and loaded together to avoid desynchronization.

Hope this help!

pranil51 · January 10, 2025, 8:11am

Yes. Save the base model once resized and update it’s location in adapters.

Topic		Replies	Views
Size mismatch error in PEFT fine tuned model 🤗Transformers	4	1460	July 2, 2024
After llama fine tuning, model merging fails Beginners	1	35	May 20, 2025
Having trouble loading a fine-tuned PEFT model (CodeLlama-13b-Instruct-hf base) 🤗Transformers	2	4310	October 6, 2024
Loading Peft model from checkpoint leading into size missmatch 🤗Transformers	6	10278	February 7, 2024
Resize embeddings on Peft model Intermediate	4	597	May 12, 2025

Size Mismatch error for LLM checkpoint of PEFT model with a resized token embeddings

Related topics