We can perform model quantization and optimization, both separately and combinedly. When we want apply both, is there an order that should be followed?
Is the order task dependent or dependent on the optimization being applied?
We can perform model quantization and optimization, both separately and combinedly. When we want apply both, is there an order that should be followed?
Is the order task dependent or dependent on the optimization being applied?
TLDR; The order is Optimization and then Quantization.
text-generation
, text2text-generation
, automatic-speech-recognition
, etc.), optimizations should be specified during the export, this way they are applied before the post-processing, specifically before any model merging which comes with changes to the graph.