🚀 Optimum Transformers: accelerated NLP pipelines with Infinity speed

AlekseyKorshuk · March 24, 2022, 5:42pm

Optimum Transformers

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX runtime.

Disclaimer

This project is my inspiration of Huggingface Infinity. And first step done by Suraj Patil.

@huggingface’s pipeline API is awesome!, right? And onnxruntime is super fast !. Wouldn’t it be great to combine these two?
– Tweet by Suraj Patil

It was under this slogan that I started doing this project!

And the main goal was to show myself to get into @huggingface team

How to use

Quick start

The usage is exactly the same as original pipelines, except minor improves:

from optimum_transformers import pipeline

pipe = pipeline("text-classification", use_onnx=True, optimize=True)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

use_onnx- converts default model to ONNX graph
optimize - optimizes converted ONNX graph with Optimum

Optimum config

Read Optimum documentation for more details

from optimum_transformers import pipeline
from optimum.onnxruntime import ORTConfig

ort_config = ORTConfig(quantization_approach="dynamic")
pipe = pipeline("text-classification", use_onnx=True, optimize=True, ort_config=ort_config)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

Benchmark

With notebook

You can benchmark pipelines easier with benchmark_pipelines notebook.

With own script

from optimum_transformers import Benchmark

task = "sentiment-analysis"
model_name = "philschmid/MiniLM-L6-H384-uncased-sst2"
num_tests = 100

benchmark = Benchmark(task, model_name)
results = benchmark(num_tests, plot=True)

Results

Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me

`sentiment-analysis`

Almost the same as in Inifinity launch video

AWS VM: g4dn.xlarge
GPU: NVIDIA T4
128 tokens
2.6 ms

`zero-shot-classification`

With typeform/distilbert-base-uncased-mnli

`token-classification`

More results are available in project repository: GitHub.

ChainYo · March 25, 2022, 5:10pm

This is awesome!! You did exactly what I wanted to achieve some weeks ago when I wrote a blog post about a custom way to use pipelines with onnx. But it wasn’t efficient because it was also loading PyTorch model along with onnx. But your solution looks awesome!!!

I want to contribute!!

Congratz

AlekseyKorshuk · March 25, 2022, 5:43pm

Thank you @ChainYo! I really appreciate this
Hope it will be useful for all the community!

AlekseyKorshuk · March 25, 2022, 5:52pm

Hugging Face Team, may be it is time to add new discussion topics here: ONNX, Optimum?

sgugger · March 25, 2022, 8:21pm

Indeed, created a category for Optimum, thanks for the suggestion!

Topic		Replies	Views
Optimum library optimization and quantization fails 🤗Optimum	8	1543	February 22, 2025
Using onnx for text-generation with GPT-2 🤗Transformers	4	4057	February 3, 2023
Optimizing models using ONNX Models	1	1117	October 21, 2020
Supporting ONNX optimized models 🤗Transformers	16	3042	September 1, 2021
Support for exporting generate function to ONNX? 🤗Transformers	7	2294	February 8, 2023