ValueError: cannot reshape array of size (GGUF)

Hey guys,

I’m trying to load this GGUF model dolphin-2.0-mistral-7b.Q4_K_M.gguf with this Python code:

model_name = "TheBloke/dolphin-2.0-mistral-7B-GGUF"
model_file = "dolphin-2.0-mistral-7b.Q4_K_M.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_name, gguf_file=model_file)
model = AutoModelForCausalLM.from_pretrained(model_name, gguf_file=model_file).to(device)

It loads the tokenizer, but when it tries to load the model I get:

ValueError: cannot reshape array of size 36864000 into shape (222,72)

Which breaks the application.

Am I loading it wrongly? I was following this guide.

I also see the same type of error.

filename = "Meta-Llama-3.1-8B-Instruct-Q8_0.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModel.from_pretrained(model_id, gguf_file=filename, local_files_only=True)

Converting and de-quantizing GGUF tensors…: 0%| | 1/292 [00:00<00:00, 2328.88it/s]
Traceback (most recent call last):
File “./tokenize_text.py”, line 15, in
model = AutoModel.from_pretrained(model_id, gguf_file=filename, local_files_only=True)
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py”, line 564, in from_pretrained
return model_class.from_pretrained(
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_utils.py”, line 3661, in from_pretrained
state_dict = load_gguf_checkpoint(gguf_path, return_tensors=True)[“tensors”]
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_gguf_pytorch_utils.py”, line 148, in load_gguf_checkpoint
weights = load_dequant_gguf_tensor(shape=shape, ggml_type=tensor.tensor_type, data=tensor.data)
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/integrations/ggml.py”, line 493, in load_dequant_gguf_tensor
values = dequantize_q8_0(data)
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/integrations/ggml.py”, line 335, in dequantize_q8_0
scales = np.frombuffer(data, dtype=np.float16).reshape(num_blocks, 1 + 16)[:, :1].astype(np.float32)
ValueError: cannot reshape array of size 279085056 into shape (3772,17)

Same symptoms, the tokenizer loads fine but then I get this reshape array error when trying to create the model. I have tried various different GGUF files with 4 bit quantization and 8 bit quantization but it doesn’t seem to matter.
I’m not sure what I am doing wrong.

Possibly due to this bug:

@H4rryM3ss I was able to get past this error by cloning the huggingface/transformers repository and running the latest code from there. At least the model loads now.

Hey @jpuser1, thanks for letting me know. I had workaround this by loading the model with Llama. But I will give a try to what you have described then.

Regards.