Hi, I am wondering if it’s possible to upload a quantized model? from “model sharing” doc, looks like we could only upload some fine-tune models based on HF transformer models.
I learned something from i-BERT, which is a Quantization-Aware-Training model. My question is it possible to upload a int8 transfomer model through Post-Training-Quantization rather Quantization-Aware-Training?
The difference between Post-Training-Quantization and Quantization-Aware-Training is the former is only related with inference phase (calibration tensor range and quantize/dequantize to int8/fp32 for perf speedup) but the latter will emulate the quantization precision loss by inserting fake_quant ops in training phase.
as the Post-Training-Quantization only involves inference phase (qconfig setting in PyTorch) and (graph rewrite In TensorFlow), I don’t know if it’s possible to upload this quantized model? through which API?
Thanks for any guidence