I am trying to find a way to configure the group size (a.k.a. block size) of NF4 quantization.
According to my understanding, NF4 quantization can be done by BitsAndBytesConfig
.
Looking at this documentation page:
Some configs like AwqConfig
and GPTQConfig
do have group_size
arg but `BitsAndBytesConfig| doesn’t.
How can I know what is the group_size of BitsAndBytesConfig
and how can I modify it?