I have a fine-tuned xlm-roberta-base for binary classification. When I try to do inference on the model, and use:
classification_pipeline = pipeline(“text-classification”,model=self.model, tokenizer=self.tokenizer, top_k=None)
results=classification_pipeline(input_normalized_text),
the processing time takes between 0.5 and 2 seconds.
However, when I add padding, truncation and batch_size to the pipeline with:
results=classification_pipeline(input_normalized_text,padding=‘max_length’,truncation=True,batch_size=8),
the processing time jumps to 1 minute suddenly.
Is there anything that Im doing wrong? How could I add the truncation and padding to the tokenizer without impacting the performance that much?