Quantization GPTQ

KashyapR · May 8, 2024, 1:58pm

Question: Quantization through GPTQ

Hi Team, I’m trying to quantize a 13b model using the below configuration on A100. I tried the below options

quantization_config = GPTQConfig(
bits=4,
group_size=128,
dataset=“wikitext2”,
batch_size=16,
desc_act=False

)

Enforce batch_size = 16 or batch_size = 2 at the quant configurations
Set tokenizer.pad_token_id = tokenizer.eos_token_id (which is 2)

I observed that even if we explicitly enforce the batch size and set the pad_token_id value other than None. It is not being considered

Can’t we set the batch_size and pad_token_id to some other value is this expected behavior with GPTQ . What is the reason behind this? Please suggest if there is any way to override the batch size config.

github.com

huggingface/optimum/blob/main/optimum/gptq/data.py#L51


      
          Returns:
              ` List[Dict[str, torch.LongTensor]]`: Batched dataset
          """
          new_examples = []
          for example in examples:
              input_ids = example["input_ids"]
              attention_mask = example["attention_mask"]
              new_examples.append(
                  {"input_ids": torch.LongTensor(input_ids), "attention_mask": torch.LongTensor(attention_mask)}
              )
          if batch_size > 1 and pad_token_id is None:
              raise ValueError(
                  "You need to pass a `pad_token_id` in `quantize_model` if you want to have examples with batch size > 1"
              )
          new_examples = [
              collate_data(new_examples[start : start + batch_size], contain_labels=False, pad_token_id=pad_token_id)
              for start in range(0, len(new_examples), batch_size)
          ]
          return new_examples

Could you kindly suggest? Appreciate your kind support.
Thanks

IlyasMoutawwakil · May 21, 2024, 12:56pm

Hi! could you try passing the pad_token_id in the GPTQConfig quantization config, from reading the code it seems this is the value that’s been used in the dataset preparation.

Topic		Replies	Views
4-bit quantization Intermediate	0	466	November 18, 2023
QLoRA with GPTQ 🤗Transformers	3	2019	September 22, 2024
GPTQ quantization on Custom dataset 🤗Transformers	4	604	January 24, 2025
GPTSAN-japanese Summarisation 🤗 Course Projects	0	264	May 8, 2024
How to catch Up with the GPT2 based model. at each iteration the size of the model increases 🤗Transformers	0	292	June 26, 2023

Quantization GPTQ

Related topics