Completely new to quantisation...having trouble with DBRX

pjw000 · April 1, 2024, 2:22pm

I’d like to make a 4 bit GPTQ version of DBRX…and have tried doing what was shown in the HF blog. Specifically:

quantization_config = GPTQConfig(
     bits=4,
     group_size=128,
     dataset="c4",
     desc_act=False,
     trust_remote_code=True,
)

and

quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto', trust_remote_code=True)

This last line fails eventually with:

Traceback (most recent call last):
  File "/data3/gptq/quantise.py", line 18, in <module>
    quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto', trust_remote_code=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3592, in from_pretrained
    hf_quantizer.postprocess_model(model)
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/quantizers/base.py", line 195, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/transformers/quantizers/quantizer_gptq.py", line 85, in _process_model_after_weight_loading
    self.optimum_quantizer.quantize_model(model, self.quantization_config.tokenizer)
  File "/data3/gptq/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 398, in quantize_model
    self.block_name_to_quantize = get_block_name_with_pattern(model)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data3/gptq/venv/lib/python3.11/site-packages/optimum/gptq/utils.py", line 77, in get_block_name_with_pattern
    raise ValueError("Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`")
ValueError: Block pattern could not be match. Pass `block_name_to_quantize` argument in `quantize_model`

I’m assuming at this point that I have missed something VERY basic, but I have no idea what…any insights welcome!

pjw000 · April 2, 2024, 5:19am

Turns out that AutoGPTQ support is in development.

nielsr · April 2, 2024, 6:23am

Yes this will be easier once DBRX is integrated natively in the Transformers library: Add DBRX Model by abhi-mosaic · Pull Request #29921 · huggingface/transformers · GitHub.

Then you can pass one of the following quantization configs to the model: Quantization.

Topic		Replies	Views
4-bit quantization Intermediate	0	465	November 18, 2023
Error with GPTQ for distilbert/distilbert-base-cased 🤗Transformers	0	21	September 16, 2024
An error i ve been trying to fix for days now Intermediate	4	426	November 19, 2024
Does loading in 4bit override an 8bit model? 🤗Transformers	0	692	October 20, 2023
Parameter Count & Shape Discrepancies in 4-bit vs. Higher bit LLM models 🤗Transformers	2	648	June 3, 2024

Completely new to quantisation...having trouble with DBRX

Related topics