Hi @meggers.
I’m like you: I know how to use the old method (transformers/convert_graph_to_onnx.py) but not the new one (transformers.onnx) to get the quantized onnx version of a Hugging Face task model (for example: a Question-Answering model).
In order to illustrate it, I did publish this notebook in Colab: ONNX Runtime with transformers.onnx for HF tasks models (for example: QA model) (not only with transformers/convert_graph_to_onnx.py)
Hope that @lysandre @mfuntowicz @valhalla @lewtun will have some time to complete the online documentation Exporting transformers models and/or to update microsoft tutorials about onnx.
Others topics about this subject:
- Inference with Finetuned BERT Model converted to ONNX does not output probabilities
- Gpt2 inference with onnx and quantize
- Got
ONNXRuntimeError
when try to run BART in ONNX format #12851
There is as well the Accelerate Hugging Face models page from microsoft but the notebooks look very complicated (heavy code).