Mixtral training creates additional embedded token and head weights

rohit-upadhya · June 13, 2024, 3:09pm

Hello I am trying to finetune a Mixtral8x7B Instruct v0.1 model using 4 bit quantization. The finetuning completes without any issue, but when I try to reload the finetuned model, it does not load it throwing the following errors :

size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32007, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

        size mismatch for lm_head.weight: copying a param with shape torch.Size([32007, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

From the error it seems as though the finetuning process added 7 new tokens without it being specified to?

Here is the tokenizer_config.json

{
  "add_bos_token": true,
  "add_eos_token": false,
  "added_tokens_decoder": {
    "0": {
      "content": "<unk>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "1": {
      "content": "<s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },
  "additional_special_tokens": [],
  "bos_token": "<s>",
  "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "</s>",
  "legacy": true,
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": "</s>",
  "sp_model_kwargs": {},
  "spaces_between_special_tokens": false,
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": "<unk>",
  "use_default_system_prompt": false
}

I am able to load the vanilla model and use it for inference.

Topic		Replies	Views
Unable to load a model with added special token 🤗Transformers	1	567	April 3, 2024
Loading pre-trained models with AddedTokens 🤗Transformers	2	740	October 14, 2024
Loading trained model with new vocab Beginners	2	1091	April 10, 2024
Size mismatch error in PEFT fine tuned model 🤗Transformers	4	1448	July 2, 2024
Having trouble loading a fine-tuned PEFT model (CodeLlama-13b-Instruct-hf base) 🤗Transformers	2	4300	October 6, 2024

Mixtral training creates additional embedded token and head weights

Related topics