How to make `pipeline` automatically scale?

Hi, I have trained a model for text classification. When I load it using the transformers’ pipeline, it just works well. The problem comes when I give as input a very big list of sentences: I get a CUDA out of memory error. When I take each example one by one in a for loop, I don’t get this error.

Is there an option to pass when instantiating the pipeline() object that enables to make predictions on a very large sequence automatically (for example by setting a batch size and iterating through the batches)? Or do I have to code this myself?

@sgugger
Thanks.

No you will have to code one yourself, the pipeline API is not designed to handle a large number of inputs automatically.

I see, thanks. I think what I need are optimizations like ONNX Runtime, quantization, etc.

The only problem I have is that the HF ONNX converter can’t convert multi-label sequence classification models yet, AFAIK. Is it planned for a future release?

I have a simple but fairly awful approach I’m using. My issue is that I can’t figure out how to predict the size of the model in advance, so I can’t automatically determine the appropriate batch size. Any thoughts?

# taken from somewhere online
def chunk_list(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]


class ZeroShotPipelineMiniBatch():

    def __init__(self, zs_pipeline, max_input_size=40):
        self.MAX_INPUT_SIZE = max_input_size
        self.pipeline = zs_pipeline

    def __call__(self, inputs, label_list, *args, **kwargs):
        batch_size = int(np.floor(self.MAX_INPUT_SIZE / len(label_list)))

        ret = []
        for chunk in chunk_list(inputs, batch_size):
            r = self.pipeline(chunk, label_list, *args, **kwargs)
            if isinstance(r, list):
                ret += r
            else:
                ret += [r]
        return ret