Size Mismatch error for LLM checkpoint of PEFT model with a resized token embeddings

I have started training the Llama 3.1 8b model using Unsloth. I made some changes in the code as I am training a new language data, i.e. added tokens to the tokenizer and resized token embeddings of the model. When I am loading the checkpoint with the transformer’s AutoModelForCausalLM, it gives me a size mismatch error. Can anyone explain this?

1 Like

Did you find any solution?

1 Like

I was using AutoModelForCausalLM for load the fine tuned model. I don’t why but when I use AutoPeftModelForCausalLM instead of AutoModelForCausalLM, it worked. During training era, I opimized the model with peft by the way.

1 Like

Hi, @pranil51

The size mismatch error is likely because the tokenizer and model are out of sync due to the addition of new tokens. When you add tokens to the tokenizer, you need to adjust the model’s embedding layer to match the updated tokenizer size. Here’s a step-by-step guide to resolve the issue:

  1. Add Tokens to the Tokenizer:
    Ensure you have added the new tokens to the tokenizer correctly:

    tokenizer.add_tokens(new_tokens)
    
  2. Resize the Token Embeddings:
    After modifying the tokenizer, you need to resize the token embeddings in the model to accommodate the new vocabulary size:

    model.resize_token_embeddings(len(tokenizer))
    
  3. Checkpoint Loading:
    When loading the checkpoint, the error occurs if the model’s checkpoint embeddings don’t match the resized embeddings. If you want to continue training from a checkpoint, ensure you:

    • Save the resized model (after resizing embeddings) to update the checkpoint.
    • Load the checkpoint only after resizing the embeddings:
      model.resize_token_embeddings(len(tokenizer))
      model.load_state_dict(torch.load(checkpoint_path), strict=False)
      

    Setting strict=False ensures the new embeddings are initialized randomly, avoiding size mismatch errors.

  4. Initialize New Embeddings:
    Newly added embeddings are initialized randomly. For better results, you can manually initialize them based on pre-trained embeddings (e.g., averaging existing embeddings).

  5. Verify Consistency:
    Double-check that the tokenizer and model are saved and loaded together to avoid desynchronization.

Hope this help!

1 Like

Yes. Save the base model once resized and update it’s location in adapters.

1 Like