Hi All,
I ran the all the steps of this notebook and I am getting error .
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)
Thanks
Manoranjan
Hi All,
I ran the all the steps of this notebook and I am getting error .
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)
Thanks
Manoranjan
Did you change anything of the notebook? When are you seeing the error? Can you please share the full logs.
2023-07-20T16:06:32.043+05:30 Found 7 modules to quantize: ['q_proj', 'v_proj', 'k_proj', 'down_proj', 'gate_proj', 'o_proj', 'up_proj']
2023-07-20T16:08:05.066+05:30 trainable params: 250,347,520 || all params: 6,922,327,040 || trainable%: 3.6165225733108386
2023-07-20T16:08:06.067+05:30 /opt/conda/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn(
2023-07-20T16:08:06.067+05:30 0%| | 0/2373 [00:00<?, ?it/s]
2023-07-20T16:08:06.067+05:30 Traceback (most recent call last): File "/opt/ml/code/run_clm.py", line 253, in <module>
2023-07-20T16:08:06.067+05:30 main() File "/opt/ml/code/run_clm.py", line 249, in main
2023-07-20T16:08:06.067+05:30 training_function(args) File "/opt/ml/code/run_clm.py", line 212, in training_function
2023-07-20T16:08:06.067+05:30 trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
2023-07-20T16:08:06.067+05:30 return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
2023-07-20T16:08:06.067+05:30 tr_loss_step = self.training_step(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step
2023-07-20T16:08:06.067+05:30 loss = self.compute_loss(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss
2023-07-20T16:08:06.067+05:30 outputs = model(**inputs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2023-07-20T16:08:06.067+05:30 return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
2023-07-20T16:08:06.067+05:30 return model_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__
2023-07-20T16:08:06.067+05:30 return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
2023-07-20T16:08:06.067+05:30 return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
2023-07-20T16:08:06.067+05:30 return self.base_model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2023-07-20T16:08:06.067+05:30 return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2023-07-20T16:08:06.067+05:30 output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
2023-07-20T16:08:06.067+05:30 outputs = self.model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2023-07-20T16:08:06.067+05:30 return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward
2023-07-20T16:08:06.067+05:30 layer_outputs = torch.utils.checkpoint.checkpoint( File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
2023-07-20T16:08:06.067+05:30 return CheckpointFunction.apply(function, preserve, *args) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
2023-07-20T16:08:06.067+05:30 return super().apply(*args, **kwargs) # type: ignore[misc] File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
2023-07-20T16:08:06.067+05:30 outputs = run_function(*args) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
2023-07-20T16:08:06.067+05:30 return module(*inputs, output_attentions, None) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2023-07-20T16:08:06.067+05:30 return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2023-07-20T16:08:06.068+05:30 output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
2023-07-20T16:08:06.068+05:30 hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
2023-07-20T16:08:06.068+05:30 return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
2023-07-20T16:08:06.068+05:30 output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 295, in forward
2023-07-20T16:08:06.068+05:30 query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)] File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 295, in <listcomp>
2023-07-20T16:08:06.068+05:30 query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)]
2023-07-20T16:08:06.068+05:30 RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)
Also I am able to replicate this in Jupyter notebook . But using SFT Trainer I am able to train .
import transformers
from trl import SFTTrainer
tokenizer.pad_token = tokenizer.eos_token
training_arguments = transformers.TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_steps=2,
num_train_epochs=3,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
output_dir="outputs",
optim="paged_adamw_8bit")
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field="text",
args=training_arguments,
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache=False
We can reproduce the result. Its related to the 13B models, config. See 70B chat wrong shape? · Issue #423 · facebookresearch/llama · GitHub
7B works for me. Could you test 7B in the mean time.
Also I am able to replicate this in Jupyter notebook . But using SFT Trainer I am able to train .
able to train with 13B model?
seriously !!! been comparing your code and mine from last 4 hours … thanks again as always for helping …
I just realized i was using 7b for SFT
It seems that you can manually change the config.json and set pretraining_tp
to 1 and it will work.
I followed your article and above fixes for finetuning llama2 7b on a custom dataset but when I deployed to sagemaker and tried running it threw an error “llama keyerror” Are there any special steps requried for deploying the llama2 to sagemaker?
HI I’m having the same problem. Where do I fine the config.json file?