Help with Quantizing phi-4 MM Fine-Tuned Vision Model and Converting to ONNX

I’m working with the phi-4 MM model that I’ve fine-tuned for a vision task, and I’m trying to:

  1. Quantize the model (preferably using INT8 or 8-bit methods like 'AWQ , Autoround orbitsandbytes ).
  2. Convert the model to ONNX format for deployment.

So far, I haven’t found clear documentation for quantizing phi-4 MM, especially given it’s a multimodal architecture. I’m particularly interested in:

  • Best practices or tools for quantizing this model
  • Whether dynamic or static quantization is supported
  • The right way to export it to ONNX (especially since it’s not a typical vision transformer or CNN)
1 Like

Hi Team, Any help would really be appreciated.

1 Like

Quantization methods such as bitsandbytes, AWQ, and GPTQ seem to be usable as-is with like other models.

Regarding ONNX, conversion appears to be possible, but the runtime side is still under development, and Phi-4 does not seem to work in practice. I wonder if it might work with a development version…