I’m working with the phi-4 MM
model that I’ve fine-tuned for a vision task, and I’m trying to:
- Quantize the model (preferably using INT8 or 8-bit methods like 'AWQ , Autoround or
bitsandbytes
). - Convert the model to ONNX format for deployment.
So far, I haven’t found clear documentation for quantizing phi-4 MM
, especially given it’s a multimodal architecture. I’m particularly interested in:
- Best practices or tools for quantizing this model
- Whether dynamic or static quantization is supported
- The right way to export it to ONNX (especially since it’s not a typical vision transformer or CNN)