How do I construct a function to inference?

psyche · September 13, 2022, 2:00am

I trained Flax model for the sequence classification well.
After that, I wrote the code for the inference function for the random single text.
But It is very slower than the PyTorch, Tensorflow model.
The code for the inference is below,

import jax
from transformers import AutoTokenizer, FlaxBertForSequenceClassification

tokenizer =  AutoTokenizer.from_pretrained("<model_name_or_path>")
model = FlaxBertForSequenceClassification.from_pretrained("<model_name_or_path>", from_pt=True)

def inference(text:str):
    tokenized = tokenizer(text, return_tensors="jax", truncation=True, max_length=512)
    return model(**tokenized)

(it takes about 2 sec on timit test but Pytorch, TensorFlow take about 400ms)
(I also try to make it as jitted, but it also takes 600ms(below))

def inference(text:str):
    return model(tokenized).logits

tokenized = tokenizer(TEXT, return_tensors="jax", truncation=True, max_length=512)['input_ids']
jitted = jax.jit(inference)
jitted(tokenized).block_until_ready()

If there is a way to enhance inference performance, please give me some wisdom…!
(What I want is the fastest way to inference (not training))

Topic		Replies	Views
Where are the jax jit annotations in flax models? Flax/JAX Projects	0	1477	May 2, 2022
How to make single-input inference faster? Create my own pipeline? 🤗Transformers	9	3944	August 26, 2021
How to use transformers for batch inference 🤗Transformers	1	28405	August 20, 2021
PreTrain ProteinBERT from scratch Flax/JAX Projects	5	2302	July 6, 2022
How to use transformers&tensorflow for batch inference Beginners	0	527	August 20, 2021

How do I construct a function to inference?

Related topics