LLama2 Finetuning giving Error mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

Hi All,

I ran the all the steps of this notebook and I am getting error .

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

Thanks
Manoranjan

Did you change anything of the notebook? When are you seeing the error? Can you please share the full logs.

2023-07-20T16:06:32.043+05:30	Found 7 modules to quantize: ['q_proj', 'v_proj', 'k_proj', 'down_proj', 'gate_proj', 'o_proj', 'up_proj']

2023-07-20T16:08:05.066+05:30	trainable params: 250,347,520 || all params: 6,922,327,040 || trainable%: 3.6165225733108386

2023-07-20T16:08:06.067+05:30	/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn(

2023-07-20T16:08:06.067+05:30	0%| | 0/2373 [00:00<?, ?it/s]

2023-07-20T16:08:06.067+05:30	Traceback (most recent call last): File "/opt/ml/code/run_clm.py", line 253, in <module>

2023-07-20T16:08:06.067+05:30	main() File "/opt/ml/code/run_clm.py", line 249, in main

2023-07-20T16:08:06.067+05:30	training_function(args) File "/opt/ml/code/run_clm.py", line 212, in training_function

2023-07-20T16:08:06.067+05:30	trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train

2023-07-20T16:08:06.067+05:30	return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop

2023-07-20T16:08:06.067+05:30	tr_loss_step = self.training_step(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step

2023-07-20T16:08:06.067+05:30	loss = self.compute_loss(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss

2023-07-20T16:08:06.067+05:30	outputs = model(**inputs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward

2023-07-20T16:08:06.067+05:30	return model_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__

2023-07-20T16:08:06.067+05:30	return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast

2023-07-20T16:08:06.067+05:30	return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward

2023-07-20T16:08:06.067+05:30	return self.base_model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward

2023-07-20T16:08:06.067+05:30	output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward

2023-07-20T16:08:06.067+05:30	outputs = self.model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward

2023-07-20T16:08:06.067+05:30	layer_outputs = torch.utils.checkpoint.checkpoint( File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint

2023-07-20T16:08:06.067+05:30	return CheckpointFunction.apply(function, preserve, *args) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply

2023-07-20T16:08:06.067+05:30	return super().apply(*args, **kwargs) # type: ignore[misc] File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward

2023-07-20T16:08:06.067+05:30	outputs = run_function(*args) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward

2023-07-20T16:08:06.067+05:30	return module(*inputs, output_attentions, None) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward

2023-07-20T16:08:06.068+05:30	output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward

2023-07-20T16:08:06.068+05:30	hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.068+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward

2023-07-20T16:08:06.068+05:30	output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 295, in forward

2023-07-20T16:08:06.068+05:30	query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)] File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 295, in <listcomp>

2023-07-20T16:08:06.068+05:30	query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)]

2023-07-20T16:08:06.068+05:30	RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

Also I am able to replicate this in Jupyter notebook . But using SFT Trainer I am able to train .

import transformers
from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
training_arguments = transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        num_train_epochs=3,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    args=training_arguments,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache=False

We can reproduce the result. Its related to the 13B models, config. See 70B chat wrong shape? · Issue #423 · facebookresearch/llama · GitHub
7B works for me. Could you test 7B in the mean time.

Also I am able to replicate this in Jupyter notebook . But using SFT Trainer I am able to train .

able to train with 13B model?

1 Like

seriously !!! been comparing your code and mine from last 4 hours … thanks again as always for helping …

I just realized i was using 7b for SFT :slight_smile:

It seems that you can manually change the config.json and set pretraining_tp to 1 and it will work.

3 Likes

I followed your article and above fixes for finetuning llama2 7b on a custom dataset but when I deployed to sagemaker and tried running it threw an error “llama keyerror” Are there any special steps requried for deploying the llama2 to sagemaker?

HI I’m having the same problem. Where do I fine the config.json file?