Enforce batch_size = 16 or batch_size = 2 at the quant configurations
Set tokenizer.pad_token_id = tokenizer.eos_token_id (which is 2)
I observed that even if we explicitly enforce the batch size and set the pad_token_id value other than None. It is not being considered
Can’t we set the batch_size and pad_token_id to some other value is this expected behavior with GPTQ . What is the reason behind this? Please suggest if there is any way to override the batch size config.
Hi! could you try passing the pad_token_id in the GPTQConfig quantization config, from reading the code it seems this is the value that’s been used in the dataset preparation.