Recommended way to perform batch inference for generation

I want to perform inference for a large number of examples. Inference is relatively slow since generate is called a lot of times for my use case (using rtx 3090). I wanted to ask what is the recommended way to perform batch inference, I’m using CTRL.
This post from @patrickvonplaten directs towards test_batch_generation method of GPT2 for variable sized sequences, but it doesn’t seem like batched because we call generate twice for both items in sentences list. Also, I don’t see the same method defined for CTRL. Also came across this GH issue which expects users to make it a whole full length sequence, so the output is also a one single vector which has the output for individual sequences that were concatenated. This approach is a bit specific and may not work as I’d expect.

Can anyone suggest a decent approach to perform batched generation with variable sized inputs ? Ideally, I’d like to pass a list containing my inputs (just like sentences) and call generate once for 3-4 inputs.

3 Likes