NotImplementedError: ggml_type 21 not implemented

gfatigati · September 23, 2024, 9:55am

I’m running a 3 bit quantized model of Llama 3.1 70b:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "bartowski/Meta-Llama-3.1-70B-Instruct-GGUF"
filename = "Meta-Llama-3.1-70B-Instruct-IQ3_XS.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

having the following error:


  File "***/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 510, in load_dequant_gguf_tensor
    raise NotImplementedError(
NotImplementedError: ggml_type 21 not implemented

I’m using

transformers 4.44.2
gguf 0.10.0

I’m doing something wrong? Thanks.

John6666 · September 23, 2024, 11:34am

There are multiple issues, but to put it simply, the GGUF files that can be used in transformers and the GGUF files that can be used in Llamacpp are currently effectively two different things.
Think of it as two different dialects.
I think GGUF was originally a proprietary format at the Llamacpp development…
The one you are trying to use is for Llamacpp. The author of the quantization can usually tell.

And since it is unlikely that anyone publishes a GGUF for transformers, you can either use that GGUF in Llamacpp or find a non-GGUF quantization file (such as GPTQ or bitsandbytes NF4) for transformers.

gfatigati · September 23, 2024, 12:15pm

Understood, thanks!

Topic		Replies	Views
Unable to run gguf model Models	1	847	January 6, 2025
Running GGUF model files using Auto classes 🤗Transformers	2	2414	March 2, 2024
What files are needed to use the HF Transformer pipeline()? Beginners	0	507	August 7, 2023
LLM architecture Dots1ForCausalLM conversion to GGUF Models	1	65	June 7, 2025
Does llama-2 need pro subscription? Beginners	6	6415	November 24, 2023

NotImplementedError: ggml_type 21 not implemented

Related topics