LLama2 Finetuning giving Error mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

monuirctc · July 20, 2023, 10:46am

Hi All,

I ran the all the steps of this notebook and I am getting error .

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

philschmid/huggingface-llama-2-samples/blob/master/training/sagemaker-notebook.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fine-tune LLaMA 2 on Amazon SageMaker\n",
    "\n",
    "In this sagemaker example, we are going to learn how to fine-tune [LLaMA 2](https://huggingface.co/meta-llama/Llama-2-70b-hf) using [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314). [LLaMA 2](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the next version of the [LLaMA](https://arxiv.org/abs/2302.13971). Compared to the V1 model, it is trained on more data - 2T tokens and supports context length window upto 4K tokens. Learn more about LLaMa 2 in the [\"\"]() blog post.\n",
    "\n",
    "QLoRA is an efficient finetuning technique that quantizes a pretrained language model to 4 bits and attaches small “Low-Rank Adapters” which are fine-tuned. This enables fine-tuning of models with up to 65 billion parameters on a single GPU; despite its efficiency, QLoRA matches the performance of full-precision fine-tuning and achieves state-of-the-art results on language tasks.\n",
    "\n",
    "In our example, we are going to leverage Hugging Face [Transformers](https://huggingface.co/docs/transformers/index), [Accelerate](https://huggingface.co/docs/accelerate/index), and [PEFT](https://github.com/huggingface/peft). \n",
    "\n",
    "In Detail you will learn how to:\n",
    "1. Setup Development Environment\n",
    "2. Load and prepare the dataset\n",
    "3. Fine-Tune LLaMA 13B with QLoRA on Amazon SageMaker\n",
    "4. Deploy Fine-tuned LLM on Amazon SageMaker\n",

This file has been truncated. show original

Thanks
Manoranjan

philschmid · July 20, 2023, 12:56pm

Did you change anything of the notebook? When are you seeing the error? Can you please share the full logs.

monuirctc · July 20, 2023, 1:25pm

2023-07-20T16:06:32.043+05:30	Found 7 modules to quantize: ['q_proj', 'v_proj', 'k_proj', 'down_proj', 'gate_proj', 'o_proj', 'up_proj']

2023-07-20T16:08:05.066+05:30	trainable params: 250,347,520 || all params: 6,922,327,040 || trainable%: 3.6165225733108386

2023-07-20T16:08:06.067+05:30	/opt/conda/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn(

2023-07-20T16:08:06.067+05:30	0%| | 0/2373 [00:00<?, ?it/s]

2023-07-20T16:08:06.067+05:30	Traceback (most recent call last): File "/opt/ml/code/run_clm.py", line 253, in <module>

2023-07-20T16:08:06.067+05:30	main() File "/opt/ml/code/run_clm.py", line 249, in main

2023-07-20T16:08:06.067+05:30	training_function(args) File "/opt/ml/code/run_clm.py", line 212, in training_function

2023-07-20T16:08:06.067+05:30	trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train

2023-07-20T16:08:06.067+05:30	return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop

2023-07-20T16:08:06.067+05:30	tr_loss_step = self.training_step(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step

2023-07-20T16:08:06.067+05:30	loss = self.compute_loss(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss

2023-07-20T16:08:06.067+05:30	outputs = model(**inputs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward

2023-07-20T16:08:06.067+05:30	return model_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__

2023-07-20T16:08:06.067+05:30	return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast

2023-07-20T16:08:06.067+05:30	return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward

2023-07-20T16:08:06.067+05:30	return self.base_model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward

2023-07-20T16:08:06.067+05:30	output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward

2023-07-20T16:08:06.067+05:30	outputs = self.model( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward

2023-07-20T16:08:06.067+05:30	layer_outputs = torch.utils.checkpoint.checkpoint( File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint

2023-07-20T16:08:06.067+05:30	return CheckpointFunction.apply(function, preserve, *args) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply

2023-07-20T16:08:06.067+05:30	return super().apply(*args, **kwargs) # type: ignore[misc] File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward

2023-07-20T16:08:06.067+05:30	outputs = run_function(*args) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward

2023-07-20T16:08:06.067+05:30	return module(*inputs, output_attentions, None) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.067+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward

2023-07-20T16:08:06.068+05:30	output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward

2023-07-20T16:08:06.068+05:30	hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

2023-07-20T16:08:06.068+05:30	return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward

2023-07-20T16:08:06.068+05:30	output = old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 295, in forward

2023-07-20T16:08:06.068+05:30	query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)] File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 295, in <listcomp>

2023-07-20T16:08:06.068+05:30	query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)]

2023-07-20T16:08:06.068+05:30	RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

monuirctc · July 20, 2023, 1:27pm

Also I am able to replicate this in Jupyter notebook . But using SFT Trainer I am able to train .

import transformers
from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
training_arguments = transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        num_train_epochs=3,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    args=training_arguments,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache=False

philschmid · July 20, 2023, 1:38pm

We can reproduce the result. Its related to the 13B models, config. See 70B chat wrong shape? · Issue #423 · facebookresearch/llama · GitHub
7B works for me. Could you test 7B in the mean time.

philschmid · July 20, 2023, 1:38pm

Also I am able to replicate this in Jupyter notebook . But using SFT Trainer I am able to train .

able to train with 13B model?

monuirctc · July 20, 2023, 1:44pm

seriously !!! been comparing your code and mine from last 4 hours … thanks again as always for helping …

monuirctc · July 20, 2023, 1:44pm

I just realized i was using 7b for SFT

philschmid · July 20, 2023, 1:45pm

It seems that you can manually change the config.json and set pretraining_tp to 1 and it will work.

admangan4400 · July 21, 2023, 12:45pm

I followed your article and above fixes for finetuning llama2 7b on a custom dataset but when I deployed to sagemaker and tried running it threw an error “llama keyerror” Are there any special steps requried for deploying the llama2 to sagemaker?

vinven7 · August 31, 2023, 12:53pm

HI I’m having the same problem. Where do I fine the config.json file?

Topic		Replies	Views
Llama2 fintuning giving RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x4096 and 1x2097152) 🤗AutoTrain	0	214	April 12, 2024
LLama2 Finetuning giving RuntimeError: mat1 and mat2 shapes cannot be multiplied (33x4096 and 1x8388608) Models	0	499	November 17, 2023
Unable to Fine Tuning LLama2 for sequence classification Beginners	0	220	January 31, 2024
Finetuning existing Lora Adapters gives "Attempting to unscale FP16 gradients" - Error 🤗Transformers	2	1313	June 25, 2024
Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error Amazon SageMaker	29	6819	January 8, 2024

LLama2 Finetuning giving Error mat1 and mat2 shapes cannot be multiplied (4096x5120 and 1x2560)

Related topics