Completely new to quantisation...having trouble with DBRX

I’d like to make a 4 bit GPTQ version of DBRX…and have tried doing what was shown in the HF blog. Specifically:

quantization_config = GPTQConfig(
     bits=4,
     group_size=128,
     dataset="c4",
     desc_act=False,
     trust_remote_code=True,
)

and

quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto', trust_remote_code=True)

This last line fails eventually with:

Traceback (most recent call last):
  File "/data3/gptq/quantise.py", line 18, in <module>
    quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto', trust_remote_code=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3592, in from_pretrained
    hf_quantizer.postprocess_model(model)
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/quantizers/base.py", line 195, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/quantizers/quantizer_gptq.py", line 85, in _process_model_after_weight_loading
    self.optimum_quantizer.quantize_model(model, self.quantization_config.tokenizer)
  File "/data3/gptq/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 398, in quantize_model
    self.block_name_to_quantize = get_block_name_with_pattern(model)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/optimum/gptq/utils.py", line 77, in get_block_name_with_pattern
    raise ValueError("Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`")
ValueError: Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`

I’m assuming at this point that I have missed something VERY basic, but I have no idea what…any insights welcome!

Turns out that AutoGPTQ support is in development.

1 Like

Yes this will be easier once DBRX is integrated natively in the Transformers library: Add DBRX Model by abhi-mosaic · Pull Request #29921 · huggingface/transformers · GitHub.

Then you can pass one of the following quantization configs to the model: Quantization.

1 Like