Text classification pipeline very slow after adding padding and truncation for tokenizer

I have a fine-tuned xlm-roberta-base for binary classification. When I try to do inference on the model, and use:
classification_pipeline = pipeline(“text-classification”,model=self.model, tokenizer=self.tokenizer, top_k=None)
results=classification_pipeline(input_normalized_text),

the processing time takes between 0.5 and 2 seconds.

However, when I add padding, truncation and batch_size to the pipeline with:
results=classification_pipeline(input_normalized_text,padding=‘max_length’,truncation=True,batch_size=8),

the processing time jumps to 1 minute suddenly.
Is there anything that Im doing wrong? How could I add the truncation and padding to the tokenizer without impacting the performance that much?

1 Like

If I only use padding or truncation the processing time still remains at 1 minute, however, when I only add the batch size, the processing time goes down to the original 0.5-2 seconds.

Found the issue, apparently if I add the padding and truncation, it stops using the GPU for the inference unless specified in the pipeline. If I dont add the truncation and the padding, it uses the GPU by default