Optimisation and Quantization of Tensorflow Model

Hey there, I have some doubt regarding optimization and qunatization of TF based model. Actually i’m doing optimization and quantization of ‘“distilbert-base-multilingual-cased”’ model, for pytorch model im able to perform optimization and qunatization. But for my use case i have to used TF based model and perform optimization and quantization but im unable to do it. It’ll helpful if anyone clear me about below questions.

  1. How do i perform Optimisation and Quantisation of any TensorFlow model available in HuggingFace Model Hub.

  2. Does Optimum Library work for TensorFlow model as well, can we use ORTModelxxx class for TensorFlow ?

  3. Optimum[export] has functionality to convert model to onnx format for Tensorflow with level of optimization but has no quantization, so after getting the optimized onnx model how can i quantised.

Hi @D3v, you can easily export and optimize your TF model with Optimum CLI as follows:

optimum-cli export onnx --model distilbert-base-multilingual-cased --framework tf --optimize O2 my_onnx_model

Then, to quantize it, you can also use the CLI as described here.