Supporting ONNX optimized models

dataiku · October 14, 2020, 9:11am

Just had a look. I am exactly looking for that kind of simplicity: nlp = pipeline("sentiment-analysis", onnx=True)

Good prototype. The main caveats I can think of are:
(i) ONNX conversion is done on-device - I’d rather pull pre-computed ONNX files from the model hub
(ii) some level duplication of huggingface source code
(iii) possible improvements in the tokenizer choices to use fast versions when available

I would really like to have this option native in huggingface so I can use it in production applications when inference speed matters a lot.

Cheers,

Alex

Topic		Replies	Views
Transformers.onnx vs optimum.onnxruntime 🤗Optimum	1	1195	September 12, 2022
Optimize large scale transformer model inference with ONNX Runtime Models	0	396	January 18, 2022
Optimizing models using ONNX Models	1	1134	October 21, 2020
Looking for help converting transformers to ONNX with HF Optimum 🤗Transformers	0	286	November 9, 2023
:rocket: Optimum Transformers: accelerated NLP pipelines with Infinity speed 🤗Transformers	4	680	March 25, 2022

Supporting ONNX optimized models

Related topics