Huggingface Transformers; Polyglot-12.8b (GPT-Neox); You may consider adding ignore_mismatched_sizes=True in the model from_pretrained

Unggi · August 21, 2023, 10:33pm

System Info

[What I used]

Polyglot-12.8b(GPT-Neox based, EleutherAI/polyglot-ko-12.8b · Hugging Face)
transformer version: 4.32.0.dev0
trainer: transformers run_clm_no_trainer(accelerate) (https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py)
used Deepspeed zero3
I added ignore_mismatched_sizes=True

model = AutoModelForCausalLM.from_pretrained(
            args.model_name_or_path,
            from_tf=bool(".ckpt" in args.model_name_or_path),
            config=config,
            low_cpu_mem_usage=args.low_cpu_mem_usage,
            ignore_mismatched_sizes=True # added
        )

[What I do]

Finetuned Polyglot; it was working.
Re-fine-tuning the 1’s model, error occured.

size mismatch for gpt_neox.layers.38.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([20480, 5120]).
        size mismatch for gpt_neox.layers.38.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([5120, 20480]).
        size mismatch for gpt_neox.layers.39.attention.query_key_value.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 5120]).
        size mismatch for gpt_neox.layers.39.attention.dense.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for gpt_neox.layers.39.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([20480, 5120]).
        size mismatch for gpt_neox.layers.39.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([5120, 20480]).
        size mismatch for embed_out.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([30003, 5120]).
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

My transformer version is lastest, and I used ignore_mismatched_sizes=True already.
But this error occured.

Can anyone know the solution of this?

Topic		Replies	Views
Help converting model weights from polycoder gpt-neox 🤗Transformers	1	440	August 11, 2022
Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 2])) 🤗Transformers	5	5459	October 13, 2023
Size mismatch error in PEFT fine tuned model 🤗Transformers	4	1448	July 2, 2024
After llama fine tuning, model merging fails Beginners	1	35	May 20, 2025
GPT-J weights on HuggingFace Models	2	387	October 20, 2021

Huggingface Transformers; Polyglot-12.8b (GPT-Neox); You may consider adding ignore_mismatched_sizes=True in the model from_pretrained

System Info

Related topics