Order between optimization and quantization

thepurpleowl · September 19, 2023, 10:14am

We can perform model quantization and optimization, both separately and combinedly. When we want apply both, is there an order that should be followed?

Is the order task dependent or dependent on the optimization being applied?

IlyasMoutawwakil · September 19, 2023, 12:19pm

TLDR; The order is Optimization and then Quantization.

When using the cli to export a model that generates (includes tasks like text-generation, text2text-generation, automatic-speech-recognition, etc.), optimizations should be specified during the export, this way they are applied before the post-processing, specifically before any model merging which comes with changes to the graph.
Quantization on the other hand should be applied after optimization and merging (both performed with the export cli command). The reason is that quantization will introduce new quantized operators (nodes in the graph) that might interfere with the other two steps.

Topic		Replies	Views
Quantization before or after fine-tuning? Beginners	0	269	May 5, 2024
Optimize AND quantize with Optimum 🤗Optimum	11	3286	February 10, 2024
Optimisation and Quantization of Tensorflow Model 🤗Optimum	1	658	May 3, 2023
Quantized Model size difference when using Optimum vs. Onnxruntime 🤗Optimum	3	1522	July 14, 2022
Optimize an ONNX Seq2Seq model 🤗Optimum	3	1923	November 17, 2022

Order between optimization and quantization

Related topics