Hi, I have trained a model for text classification. When I load it using the transformers’ pipeline, it just works well. The problem comes when I give as input a very big list of sentences: I get a CUDA out of memory error. When I take each example one by one in a for loop, I don’t get this error.
Is there an option to pass when instantiating the pipeline() object that enables to make predictions on a very large sequence automatically (for example by setting a batch size and iterating through the batches)? Or do I have to code this myself?
I see, thanks. I think what I need are optimizations like ONNX Runtime, quantization, etc.
The only problem I have is that the HF ONNX converter can’t convert multi-label sequence classification models yet, AFAIK. Is it planned for a future release?
I have a simple but fairly awful approach I’m using. My issue is that I can’t figure out how to predict the size of the model in advance, so I can’t automatically determine the appropriate batch size. Any thoughts?
# taken from somewhere online
def chunk_list(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
class ZeroShotPipelineMiniBatch():
def __init__(self, zs_pipeline, max_input_size=40):
self.MAX_INPUT_SIZE = max_input_size
self.pipeline = zs_pipeline
def __call__(self, inputs, label_list, *args, **kwargs):
batch_size = int(np.floor(self.MAX_INPUT_SIZE / len(label_list)))
ret = []
for chunk in chunk_list(inputs, batch_size):
r = self.pipeline(chunk, label_list, *args, **kwargs)
if isinstance(r, list):
ret += r
else:
ret += [r]
return ret