Mistral - Sentence classification - mat1 and mat2 shapes cannot be multiplied

jlamon · February 6, 2024, 10:33am

Hi all,

While I already have a CamemBERT model running for a sentence classification task (mail classification), I am willing to see the results over the Mistral 7B model. As this model is quite different from BERT, it is relatively hard to use the “code template” used for BERT.

Hence, I am first trying to get a grasp of tutorials, such as the following. However, I see that it takes 10 hours to run it and there is no model.cuda() element in the code. But, when I add it, I get the following error:

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-4c99aa47a15f> in <cell line: 10>()
      8 )
      9 
---> 10 trainer.train()

42 frames
/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py in forward(ctx, A, B, out, bias, quant_state)
    514         # 1. Dequantize
    515         # 2. MatmulnN
--> 516         output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
    517 
    518         # 3. Save state

RuntimeError: mat1 and mat2 shapes cannot be multiplied (8192x4096 and 1x8388608)

Could anyone help me with this issue ? I would really appreciate it.

It must be stated that this error is also happening on other tutorials, or code snippets made by myself.

The code is run on GC, with the following environment:

- `transformers` version: 4.38.0.dev0
- Platform: Linux-6.1.58+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.26.1
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): 2.15.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.8.0 (cpu)
- Jax version: 0.4.23
- JaxLib version: 0.4.23
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO

I already thank you for y’all answer !!

dipesh · February 28, 2024, 8:07am

facing same issue, did you find any solution?

jlamon · February 28, 2024, 11:16am

Hi,

Yeah, the solution was just to not put it on cuda. It feels like the model is set on the GPU anyways (GPU RAM is full, and using 100% of its power).

AbhayUrmaliya2004 · November 5, 2024, 8:43am

I am finetuning llama 2 7b after training this works fine in runtime but if I save the model and then load it to
use it for inferencing later it gives error

Dequantize
515 # 2. MatmulnN
→ 516 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
517
518 # 3. Save state

RuntimeError: mat1 and mat2 shapes cannot be multiplied (9x4096 and 1x8388608)

please help me out with this

John6666 · November 5, 2024, 9:05am

At first glance, it looks like a bug in torch, but it might be a problem with bitsandbytes.
It would be easy to fix by updating or downgrading the library…

github.com/bitsandbytes-foundation/bitsandbytes

Bug when moving 4bit quantized model betwee CPU and GPU

opened 12:05PM - 21 Dec 23 UTC

closed 01:20PM - 18 Jan 24 UTC

michaelnny

Hi, I've found new implementation of Params4bit.cuda() will cause issue when …moving a quantized model to CPU and GPU. The issue seems to be caused by the removal of the old condition check `if self.quant_state is not None`. https://github.com/TimDettmers/bitsandbytes/blob/f63abb5a0d0bc971d28972ba890a9e59596caac4/bitsandbytes/nn/modules.py#L167-L173 For example, if I temporarly move the quantized model to CPU, and then try to move it back to GPU, then the already quantized weights will be quantized again. ``` class QuantizedModel(torch.nn.Module): def __init__(self): super().__init__() self.ln = bnb.nn.Linear4bit(16, 16, bias=False, quant_type='nf4') def forward(self, x): return self.ln(x) # step 1 - initialize model bnb_model = QuantizedModel() bnb_model.to('cuda') # This works fine bnb_output = bnb_model(input.to('cuda')) print(bnb_output) # step 2 - move model to CPU bnb_model.to('cpu') # step 3 - move model back to GPU <==== this re-quantize cause shape mismatch during forward pass bnb_model.to('cuda') bnb_output = bnb_model(input.to('cuda')) print(bnb_output) ``` Full error message ``` RuntimeError Traceback (most recent call last) [<ipython-input-12-7e73cf2ab8eb>](https://localhost:8080/#) in <cell line: 2>() 1 bnb_model.to('cuda') ----> 2 bnb_output = bnb_model(input.to('cuda')) 3 print(bnb_output) 8 frames [/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py](https://localhost:8080/#) in forward(ctx, A, B, out, bias, quant_state) 514 # 1. Dequantize 515 # 2. MatmulnN --> 516 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias) 517 518 # 3. Save state RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x16 and 1x128) ``` UPDATE: The full test code to reproduce the issue can be found in the following Colab notebook: https://colab.research.google.com/drive/1rBWjA5VsWdTE6GiATJWHFKHbdt-HrHE4?usp=sharing

github.com/bitsandbytes-foundation/bitsandbytes

4bit matmul error if have load model to cpu

opened 07:26PM - 07 Dec 23 UTC

closed 01:46PM - 18 Dec 23 UTC

wcy1122

huggingface-related

I try to load the model in 4bit at GPU, and then first model.cpu() to move it to… CPU, and then use model.cuda() to move it to GPU, and forward pass the model. It shows an error. File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 248, in forward out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state) File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 579, in matmul_4bit return MatMul4Bit.apply(A, B, out, bias, quant_state) File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/root/anaconda3/envs/llama-vid/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 516, in forward output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1702x1408 and 1x1441792)

Topic		Replies	Views
torch.nn.DataParallel Mistral-7B-Instruct RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! Beginners	1	64	August 20, 2024
Auto Model for Sequence Classification take more than 20 minutes to classify a single sequence 🤗Transformers	3	245	March 7, 2024
Poor performance from Mistral-7B-Instruct-v0.1 Beginners	1	1546	March 1, 2024
Fine tune Mistral 7B for text classification error Models	1	1701	January 9, 2024
Mistral 7B RAG Langchaing Models	0	2619	February 20, 2024

Mistral - Sentence classification - mat1 and mat2 shapes cannot be multiplied

Related topics