4 Bit quantization

Is there a way to do 4 bit quantization using optimum?

Hi @sameearif88, GPTQ will soon be available in Optimum to enable 4-bit quantization. You can follow the ongoing PR here: https://github.com/huggingface/optimum/pull/1216

Is it possible to 4 bit quantize OpenAI Whisper and Facebook MMS audio models using Optimum?


@sameearif88 GPTQ only works for text models at the moment so it won’t be possible to perform 4-bit quantization of speech models right away.

@sameearif88 Feel free to open a feature request for it on GitHub.

