Package versions [image] [image] [image] [image] While exporting a question answering model (“deepset/minilm-uncased-squad2”) to ONNX and quantizing it(dynamic quantization) with Optimum, the model size is 68 MB. The same model exported while using ONNXRuntime is 32 MB. Why is there a differ…

Quantized Model size difference when using Optimum vs. Onnxruntime

philschmid July 14, 2022, 8:09am 2

I see you already opened issue at optimum: It makes sense to focus the discussion on one platform.

Feel free to either close the issue or this thread.

Topic		Replies	Views
Optimum library optimization and quantization fails 🤗Optimum	8	1600	February 22, 2025
Improving Quantization Accuracy for ONNX Models with Optimum 🤗Optimum	0	749	February 8, 2024
Quantization of facebook/opt-13b model 🤗Transformers	0	1007	July 28, 2022
Optimum & RoBERTa: how far can we trust a quantized model against its pytorch version? 🤗Optimum	10	2421	July 27, 2022
Optimize AND quantize with Optimum 🤗Optimum	11	3314	February 10, 2024