Batching in SageMaker Inference Toolkit

charlesatftl · September 3, 2021, 5:28pm

Thanks for putting together this great toolkit. I had a question of how inference batching is handled. I noticed that the examples here all appear to have a single input request. Once deployed, if multiple requests are made to the endpoint of a deployed model at once or in quick succession, are they automatically batched under the hood, or is there something you need to do before hitting the endpoint to feed in a batch of inputs manually?

julien-c · September 3, 2021, 10:16pm

pinging @philschmid and @jeffboudier in case they hadn’t seen this!

philschmid · September 5, 2021, 9:29am

Hello @charlesatftl,

Thank you for the nice feedback.
Since the Inference Toolkit is based on top of the transformers pipelines i currently handle batching the same way as the pipelines are doing. This means a few pipelines support batching, e.g. text-classification or zero-shot-classifciation, because it is faster other pipelines like question-answering are not supporting batching they are doing sequential predictions.

But you can send to all pipelines/inference toolkit a request with multiple inputs, e.g.

{"inputs": ["sentence 1","sentence 2"]}

The pipeline either then batches the request or runs it in sequence.

For your dynamic batching that’s currently not supported, but you could create a custom inference.py and create it yourself.

P.S. Batching in NLP is not as efficient as in CV for example, since when doing batching all sequence need to padded to the same length, which could be slower than doing sequential predictions.

Topic		Replies	Views
Batch_transform Pipeline? Amazon SageMaker	9	3468	September 28, 2021
How are the inputs tokenized when model deployment? Amazon SageMaker	13	4293	September 3, 2021
Inference API response time scales linearly with number of inputs Beginners	0	271	November 1, 2021
Help for inference.py code Amazon SageMaker	10	4010	March 8, 2022
How to configure GPU server-side batching with SageMaker HF Hosting? Amazon SageMaker	1	674	May 4, 2022

Batching in SageMaker Inference Toolkit

Related topics