🚀 Optimum Transformers: accelerated NLP pipelines with Infinity speed

Optimum Transformers

Accelerated NLP pipelines for fast inference :rocket: on CPU and GPU. Built with :hugs:Transformers, Optimum and ONNX runtime.

GitHub stars

Disclaimer

This project is my inspiration of Huggingface Infinity. And first step done by Suraj Patil.

@huggingface’s pipeline API is awesome!:star_struck:, right? And onnxruntime is super fast !:rocket:. Wouldn’t it be great to combine these two?
– Tweet by Suraj Patil

It was under this slogan that I started doing this project!

And the main goal was to show myself to get into @huggingface team :hugs:

How to use

Quick start

The usage is exactly the same as original pipelines, except minor improves:

from optimum_transformers import pipeline

pipe = pipeline("text-classification", use_onnx=True, optimize=True)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
  • use_onnx- converts default model to ONNX graph
  • optimize - optimizes converted ONNX graph with Optimum

Optimum config

Read Optimum documentation for more details

from optimum_transformers import pipeline
from optimum.onnxruntime import ORTConfig

ort_config = ORTConfig(quantization_approach="dynamic")
pipe = pipeline("text-classification", use_onnx=True, optimize=True, ort_config=ort_config)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

Benchmark

With notebook

You can benchmark pipelines easier with benchmark_pipelines notebook.

With own script

from optimum_transformers import Benchmark

task = "sentiment-analysis"
model_name = "philschmid/MiniLM-L6-H384-uncased-sst2"
num_tests = 100

benchmark = Benchmark(task, model_name)
results = benchmark(num_tests, plot=True)

Results

Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me :hugs:

sentiment-analysis

Almost the same as in Inifinity launch video :hugs:

AWS VM: g4dn.xlarge
GPU: NVIDIA T4
128 tokens
2.6 ms

zero-shot-classification

With typeform/distilbert-base-uncased-mnli

token-classification

More results are available in project repository: GitHub.

6 Likes

This is awesome!! You did exactly what I wanted to achieve some weeks ago when I wrote a blog post about a custom way to use pipelines with onnx. But it wasn’t efficient because it was also loading PyTorch model along with onnx. But your solution looks awesome!!!

I want to contribute!!

Congratz :clap:

2 Likes

Thank you @ChainYo! I really appreciate this :hugs:
Hope it will be useful for all the community!

Hugging Face Team, may be it is time to add new discussion topics here: ONNX, Optimum? :hugs:

2 Likes

Indeed, created a category for Optimum, thanks for the suggestion!

2 Likes