Text classification pipeline very slow after adding padding and truncation for tokenizer

alexperez · February 21, 2023, 11:40am

I have a fine-tuned xlm-roberta-base for binary classification. When I try to do inference on the model, and use:
classification_pipeline = pipeline(“text-classification”,model=self.model, tokenizer=self.tokenizer, top_k=None)
results=classification_pipeline(input_normalized_text),

the processing time takes between 0.5 and 2 seconds.

However, when I add padding, truncation and batch_size to the pipeline with:
results=classification_pipeline(input_normalized_text,padding=‘max_length’,truncation=True,batch_size=8),

the processing time jumps to 1 minute suddenly.
Is there anything that Im doing wrong? How could I add the truncation and padding to the tokenizer without impacting the performance that much?

alexperez · February 21, 2023, 11:53am

If I only use padding or truncation the processing time still remains at 1 minute, however, when I only add the batch size, the processing time goes down to the original 0.5-2 seconds.

alexperez · February 27, 2023, 12:03pm

Found the issue, apparently if I add the padding and truncation, it stops using the GPU for the inference unless specified in the pipeline. If I dont add the truncation and the padding, it uses the GPU by default

Topic		Replies	Views
Pipeline very slow 🤗Transformers	1	4332	May 5, 2023
Tokenizer extremely slow when deployed to a container 🤗Tokenizers	0	1287	April 14, 2023
How do I setup a TextClassificationPipeline that truncates token sequences Beginners	0	326	September 29, 2021
Truncating sequence -- within a pipeline Beginners	7	5783	May 3, 2024
Batched pipeline inference has little speed improvement on longer texts Beginners	1	1879	October 27, 2023

Text classification pipeline very slow after adding padding and truncation for tokenizer

Related topics