How to get Accelerated Inference API for T5 models?

pierreguillou · November 17, 2021, 4:05pm

Just to confirm what I wrote in the first post of this thread, I did the same tests with InferenceApi from huggingface_hub.inference_api.

Indeed, the huggingface_hub library has a client wrapper to access the Inference API programmatically (doc: “How to programmatically access the Inference API”).

Therefore, I did run the following code in a Google Colab notebook:

!pip install huggingface_hub
from huggingface_hub.inference_api import InferenceApi

API_TOKEN = 'xxxxxxx' # my HF API token
model_name = "t5-base"

inference = InferenceApi(repo_id=model_name, token=API_TOKEN)
print(inference)

I got as output:

InferenceApi(options='{'wait_for_model': True, 'use_gpu': False}', headers='{'Authorization': 'xxxxxx'}', task='translation', api_url='https://api-inference.huggingface.co/pipeline/translation/t5-base')

Then, I ran the following code:

%%time
inputs = "Translate English to German: My name is Claude."
output = inference(inputs=inputs)
print(output)

And I got as output:

[{'translation_text': 'Mein Name ist Claude.'}]
CPU times: user 14 ms, sys: 1.05 ms, total: 15.1 ms
Wall time: 651 ms

When I ran a second time the same code, I got the cache output:

[{'translation_text': 'Mein Name ist Claude.'}]
CPU times: user 14.3 ms, sys: 581 µs, total: 14.9 ms
Wall time: 133 ms

We can observe that the inference times (initial and cache) correspond to those published in my first post (I guess this is normal because the code behind is the same). However, we end up with the same question: how can I get Accelerated Inference API for a T5 model?

Topic		Replies	Views
How to deploy a T5 model to AWS SageMaker for fast inference? Amazon SageMaker	13	5795	February 28, 2022
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5623	June 8, 2023
T5 inference performance Models	5	1569	March 8, 2022
How to make single-input inference faster? Create my own pipeline? 🤗Transformers	9	3949	August 26, 2021
Inference API works for flan-t5-xxl, but not for many other models I have tried with Jupyter/VSCode 🤗Transformers	0	373	June 15, 2023

How to get Accelerated Inference API for T5 models?

Related topics