I want to perform inference for a large number of examples. Inference is relatively slow since generate
is called a lot of times for my use case (using rtx 3090). I wanted to ask what is the recommended way to perform batch inference, I’m using CTRL.
This post from @patrickvonplaten directs towards test_batch_generation
method of GPT2 for variable sized sequences, but it doesn’t seem like batched because we call generate
twice for both items in sentences
list. Also, I don’t see the same method defined for CTRL. Also came across this GH issue which expects users to make it a whole full length sequence, so the output is also a one single vector which has the output for individual sequences that were concatenated. This approach is a bit specific and may not work as I’d expect.
Can anyone suggest a decent approach to perform batched generation with variable sized inputs ? Ideally, I’d like to pass a list containing my inputs (just like sentences
) and call generate once for 3-4 inputs.