New pipeline for zero-shot text classification

Hi @valhalla, thanks for developing the onnx_transformers. I have tried it with zero-shot-classification pipeline and do a benchmark between using onnx and just using pytorch, following the benchmark_pipelines notebook. I tried several SageMaker instances with various numbers of cores and CPU types. It seems that using an instance that has more CPU core will give more speed-up when but using an instance with more cores is more expensive and at a certain level, the price is almost the same as using a GPU.
I wonder if there are other ways to speed things up while keep the cost minimal. I found that quantization may help but it seems that onnx_transformers doesn’t support onnx quantize yet. Do you have plan to support it? Can you kindly give me some reference to use onnx quantize with zero-shot-classification pipeline (with or without using onnx_transformers)?
Thanks in advance!