I’m running a 3 bit quantized model of Llama 3.1 70b:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "bartowski/Meta-Llama-3.1-70B-Instruct-GGUF"
filename = "Meta-Llama-3.1-70B-Instruct-IQ3_XS.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
having the following error:
File "***/lib/python3.11/site-packages/transformers/integrations/ggml.py", line 510, in load_dequant_gguf_tensor
raise NotImplementedError(
NotImplementedError: ggml_type 21 not implemented
I’m using
- transformers 4.44.2
- gguf 0.10.0
I’m doing something wrong? Thanks.