Hey there, I have some doubt regarding optimization and qunatization of TF based model. Actually i’m doing optimization and quantization of ‘“distilbert-base-multilingual-cased”’ model, for pytorch model im able to perform optimization and qunatization. But for my use case i have to used TF based model and perform optimization and quantization but im unable to do it. It’ll helpful if anyone clear me about below questions.
How do i perform Optimisation and Quantisation of any TensorFlow model available in HuggingFace Model Hub.
Does Optimum Library work for TensorFlow model as well, can we use ORTModelxxx class for TensorFlow ?
Optimum[export] has functionality to convert model to onnx format for Tensorflow with level of optimization but has no quantization, so after getting the optimized onnx model how can i quantised.