Recommended way to perform batch inference for generation

prajjwal1 · March 6, 2021, 10:18pm

I want to perform inference for a large number of examples. Inference is relatively slow since generate is called a lot of times for my use case (using rtx 3090). I wanted to ask what is the recommended way to perform batch inference, I’m using CTRL.
This post from @patrickvonplaten directs towards test_batch_generation method of GPT2 for variable sized sequences, but it doesn’t seem like batched because we call generate twice for both items in sentences list. Also, I don’t see the same method defined for CTRL. Also came across this GH issue which expects users to make it a whole full length sequence, so the output is also a one single vector which has the output for individual sequences that were concatenated. This approach is a bit specific and may not work as I’d expect.

Can anyone suggest a decent approach to perform batched generation with variable sized inputs ? Ideally, I’d like to pass a list containing my inputs (just like sentences) and call generate once for 3-4 inputs.

Topic		Replies	Views
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5445	July 24, 2022
Inference slows down after restrictions 🤗Transformers	0	204	March 22, 2021
How to use transformers for batch inference 🤗Transformers	1	28773	August 20, 2021
Batch_decode does not give the correct output as generate 🤗Transformers	0	303	March 17, 2022
Inference API response time scales linearly with number of inputs Beginners	0	271	November 1, 2021

Recommended way to perform batch inference for generation

Related topics