How do I setup a TextClassificationPipeline that truncates token sequences

drevicko · September 29, 2021, 9:50pm

I’m using a TextClassificationPipeline from a pretrained model (“bhadresh-savani/roberta-base-emotion”), and I would like it to truncate inputs to the maximum token sequence length. This is not happening by default.

My code to set up the pipeline looks like this:

tokenizer = AutoTokenizer.from_pretrained("bhadresh-savani/roberta-base-emotion")  # , inputs=
model = AutoModelForSequenceClassification.from_pretrained("bhadresh-savani/roberta-base-emotion")
classifier = TextClassificationPipeline(model=model, return_all_scores=True, tokenizer=tokenizer, device=0)

Topic		Replies	Views
Tokenizer truncation Beginners	1	1786	June 14, 2022
Truncating sequence -- within a pipeline Beginners	7	5797	May 3, 2024
Predictions with pipeline fails to truncate test set 🤗Transformers	0	180	January 23, 2024
Tokenizer behaviour with pipeline 🤗Tokenizers	0	923	August 1, 2023
Zero-Shot Classification Pipeline - Truncating Beginners	4	1158	May 27, 2021

How do I setup a TextClassificationPipeline that truncates token sequences

Related topics