Batched pipeline inference has little speed improvement on longer texts

Hi! I’m doing a zero-shot classification using the pipeline. I noticed that when input text is short (e.g. 10 words), the return to batched inference is very big: we have roughly doubled speed in batch size 2 vs no batching. However, when the inputs are longer (e.g. ~500 words), passing those texts sequentially has more or less the same speed as batched inference. Is it because that we have already “max out” the GPU’s computation power?

In other words, inferencing on sentences of 10 words in a batch size of 10 is equivalent to inferencing one single sentence of 100 words in batch size 1 (no batching). I’ve seen something similar here. Just want to ask if this is correct and if anyone else has similar experience?

You can find my code here:

import timeit

from transformers import pipeline


pipe = pipeline('zero-shot-classification', model='facebook/bart-large-mnli', device=0)
text = 'Hi! I’m now doing a zero-shot classification using the pipeline.'  # 10 words

# Batched inference speed of different input sentence length and batch size
for sentence_len in [10, 100, 500]:
    for batch_size in [1, 2, 4, 8]:
        time = timeit.timeit(lambda: pipe([text*(sentence_len//10)]*64,
                                          candidate_labels=['topic 1', 'topic 2'],
                                          batch_size=batch_size),
                             number=10)
        print(f'Sentence length: {sentence_len}, batch size: {batch_size}, time per sentence: {time/64:.3f}')

In Colab (free T4 GPU), I got the following results:

Sentence length: 10,  batch size: 1, time per sentence: 0.831
Sentence length: 10,  batch size: 2, time per sentence: 0.355
Sentence length: 10,  batch size: 4, time per sentence: 0.181
Sentence length: 10,  batch size: 8, time per sentence: 0.160
Sentence length: 100, batch size: 1, time per sentence: 1.013
Sentence length: 100, batch size: 2, time per sentence: 0.934
Sentence length: 100, batch size: 4, time per sentence: 0.875
Sentence length: 100, batch size: 8, time per sentence: 0.873
Sentence length: 500, batch size: 1, time per sentence: 5.567
Sentence length: 500, batch size: 2, time per sentence: 5.310
Sentence length: 500, batch size: 4, time per sentence: 4.730
Sentence length: 500, batch size: 8, time per sentence: 4.673