Hey there, I have some doubt regarding optimization and qunatization of TF based model. Actually iām doing optimization and quantization of āādistilbert-base-multilingual-casedāā model, for pytorch model im able to perform optimization and qunatization. But for my use case i have to used TF based model and perform optimization and quantization but im unable to do it. Itāll helpful if anyone clear me about below questions.
-
How do i perform Optimisation and Quantisation of any TensorFlow model available in HuggingFace Model Hub.
-
Does Optimum Library work for TensorFlow model as well, can we use ORTModelxxx class for TensorFlow ?
-
Optimum[export] has functionality to convert model to onnx format for Tensorflow with level of optimization but has no quantization, so after getting the optimized onnx model how can i quantised.