Quantization GPTQ

Question: Quantization through GPTQ

Hi Team, I’m trying to quantize a 13b model using the below configuration on A100. I tried the below options

quantization_config = GPTQConfig(
bits=4,
group_size=128,
dataset=“wikitext2”,
batch_size=16,
desc_act=False

)

  1. Enforce batch_size = 16 or batch_size = 2 at the quant configurations
  2. Set tokenizer.pad_token_id = tokenizer.eos_token_id (which is 2)

I observed that even if we explicitly enforce the batch size and set the pad_token_id value other than None. It is not being considered

Can’t we set the batch_size and pad_token_id to some other value is this expected behavior with GPTQ . What is the reason behind this? Please suggest if there is any way to override the batch size config.

Could you kindly suggest? Appreciate your kind support.
Thanks

Hi! could you try passing the pad_token_id in the GPTQConfig quantization config, from reading the code it seems this is the value that’s been used in the dataset preparation.