Hi! I’ve been excited about the recent quantization integration of auto-gptq and appreciate the work folks did to make this happen. I was trying to play around a bit, and saw this line
which seems to imply that auto gptq only supports float16 vs. bfloat16… I uncommented this line and things seem to be working fine — do folks have thoughts about this? TheBloke seems to sometimes run quantization with bfloat16 ([BUG]CUDA OUT OF MEMORY · Issue #179 · PanQiWei/AutoGPTQ · GitHub) so I am not sure why this line is here, but could be missing something (e.g., maybe some of the CUDA kernels are float16 only)?