Converting and de-quantizing GGUF tensors…: 0%| | 1/292 [00:00<00:00, 2328.88it/s]
Traceback (most recent call last):
File “./tokenize_text.py”, line 15, in
model = AutoModel.from_pretrained(model_id, gguf_file=filename, local_files_only=True)
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py”, line 564, in from_pretrained
return model_class.from_pretrained(
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_utils.py”, line 3661, in from_pretrained
state_dict = load_gguf_checkpoint(gguf_path, return_tensors=True)[“tensors”]
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_gguf_pytorch_utils.py”, line 148, in load_gguf_checkpoint
weights = load_dequant_gguf_tensor(shape=shape, ggml_type=tensor.tensor_type, data=tensor.data)
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/integrations/ggml.py”, line 493, in load_dequant_gguf_tensor
values = dequantize_q8_0(data)
File “/root/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/integrations/ggml.py”, line 335, in dequantize_q8_0
scales = np.frombuffer(data, dtype=np.float16).reshape(num_blocks, 1 + 16)[:, :1].astype(np.float32)
ValueError: cannot reshape array of size 279085056 into shape (3772,17)
Same symptoms, the tokenizer loads fine but then I get this reshape array error when trying to create the model. I have tried various different GGUF files with 4 bit quantization and 8 bit quantization but it doesn’t seem to matter.
I’m not sure what I am doing wrong.
@H4rryM3ss I was able to get past this error by cloning the huggingface/transformers repository and running the latest code from there. At least the model loads now.
Hey @jpuser1, thanks for letting me know. I had workaround this by loading the model with Llama. But I will give a try to what you have described then.