Huggingface Transformers; Polyglot-12.8b (GPT-Neox); You may consider adding ignore_mismatched_sizes=True in the model from_pretrained

System Info

[What I used]

  1. Polyglot-12.8b(GPT-Neox based, EleutherAI/polyglot-ko-12.8b · Hugging Face)
  2. transformer version: 4.32.0.dev0
  3. trainer: transformers run_clm_no_trainer(accelerate) (https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py)
  4. used Deepspeed zero3
  5. I added ignore_mismatched_sizes=True
model = AutoModelForCausalLM.from_pretrained(
            args.model_name_or_path,
            from_tf=bool(".ckpt" in args.model_name_or_path),
            config=config,
            low_cpu_mem_usage=args.low_cpu_mem_usage,
            ignore_mismatched_sizes=True # added
        )

[What I do]

  1. Finetuned Polyglot; it was working.
  2. Re-fine-tuning the 1’s model, error occured.
size mismatch for gpt_neox.layers.38.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([20480, 5120]).
        size mismatch for gpt_neox.layers.38.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([5120, 20480]).
        size mismatch for gpt_neox.layers.39.attention.query_key_value.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 5120]).
        size mismatch for gpt_neox.layers.39.attention.dense.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for gpt_neox.layers.39.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([20480, 5120]).
        size mismatch for gpt_neox.layers.39.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([5120, 20480]).
        size mismatch for embed_out.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([30003, 5120]).
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

My transformer version is lastest, and I used ignore_mismatched_sizes=True already.
But this error occured.

Can anyone know the solution of this?