Help with Quantizing phi-4 MM Fine-Tuned Vision Model and Converting to ONNX

Meemank · April 21, 2025, 2:58pm

I’m working with the phi-4 MM model that I’ve fine-tuned for a vision task, and I’m trying to:

Quantize the model (preferably using INT8 or 8-bit methods like 'AWQ , Autoround orbitsandbytes ).
Convert the model to ONNX format for deployment.

So far, I haven’t found clear documentation for quantizing phi-4 MM, especially given it’s a multimodal architecture. I’m particularly interested in:

Best practices or tools for quantizing this model
Whether dynamic or static quantization is supported
The right way to export it to ONNX (especially since it’s not a typical vision transformer or CNN)

Meemank · April 22, 2025, 4:40am

Hi Team, Any help would really be appreciated.

John6666 · April 22, 2025, 6:10am

Quantization methods such as bitsandbytes, AWQ, and GPTQ seem to be usable as-is with like other models.

Regarding ONNX, conversion appears to be possible, but the runtime side is still under development, and Phi-4 does not seem to work in practice. I wonder if it might work with a development version…

Meemank · May 2, 2025, 9:51am

Do we have any sample notebook where we are quantizing phi4-mm-instruct finetuned vision model?

Topic		Replies	Views
🔧 Optimizing Phi-4 MM Instruct Vision Model (ONNX Inference) Intermediate	1	48	April 24, 2025
Quantization of facebook/opt-13b model 🤗Transformers	0	1000	July 28, 2022
Prakash Hinduja Switzerland (Swiss) Can I use Hugging Face models for real-time inference on edge devices? Beginners	1	23	June 23, 2025
SmolVLM 8bit Quantization Problem Models	3	476	November 29, 2024
ONNX conversion 🤗Transformers	0	285	July 8, 2021

Help with Quantizing phi-4 MM Fine-Tuned Vision Model and Converting to ONNX

Related topics