4 Bit quantization

sameearif88 · August 7, 2023, 7:51pm

Is there a way to do 4 bit quantization using optimum?

regisss · August 9, 2023, 4:38pm

Hi @sameearif88, GPTQ will soon be available in Optimum to enable 4-bit quantization. You can follow the ongoing PR here: https://github.com/huggingface/optimum/pull/1216

sameearif88 · August 9, 2023, 5:16pm

Hello,

Is it possible to 4 bit quantize OpenAI Whisper and Facebook MMS audio models using Optimum?

Thanks

regisss · August 9, 2023, 5:22pm

@sameearif88 GPTQ only works for text models at the moment so it won’t be possible to perform 4-bit quantization of speech models right away.

fxmarty · August 11, 2023, 3:17pm

@sameearif88 Feel free to open a feature request for it on GitHub.

Topic		Replies	Views
4-bit quantization Intermediate	0	466	November 18, 2023
Improving Whisper for Inference 🤗Optimum	11	3835	September 20, 2023
Help with Quantizing phi-4 MM Fine-Tuned Vision Model and Converting to ONNX Intermediate	3	65	May 2, 2025
Inference 8 bit or 4 bit bit models on cpu? Beginners	2	3098	August 3, 2023
Quantization GPTQ 🤗Optimum	1	231	May 21, 2024