I’d like to make a 4 bit GPTQ version of DBRX…and have tried doing what was shown in the HF blog. Specifically:
quantization_config = GPTQConfig(
bits=4,
group_size=128,
dataset="c4",
desc_act=False,
trust_remote_code=True,
)
and
quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto', trust_remote_code=True)
This last line fails eventually with:
Traceback (most recent call last):
File "/data3/gptq/quantise.py", line 18, in <module>
quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto', trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3592, in from_pretrained
hf_quantizer.postprocess_model(model)
File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/quantizers/base.py", line 195, in postprocess_model
return self._process_model_after_weight_loading(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/quantizers/quantizer_gptq.py", line 85, in _process_model_after_weight_loading
self.optimum_quantizer.quantize_model(model, self.quantization_config.tokenizer)
File "/data3/gptq/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data3/gptq/venv/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 398, in quantize_model
self.block_name_to_quantize = get_block_name_with_pattern(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data3/gptq/venv/lib/python3.11/site-packages/optimum/gptq/utils.py", line 77, in get_block_name_with_pattern
raise ValueError("Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`")
ValueError: Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`
I’m assuming at this point that I have missed something VERY basic, but I have no idea what…any insights welcome!