Truncating sequence -- within a pipeline

AlanFeder · July 16, 2020, 11:25pm

Hi all,

Thanks for making this forum!

I have a list of tests, one of which apparently happens to be 516 tokens long. I have been using the feature-extraction pipeline to process the texts, just using the simple function:

nlp = pipeline('feature-extraction')

When it gets up to the long text, I get an error:

Token indices sequence length is longer than the specified maximum sequence length for this model (516 > 512). Running this sequence through the model will result in indexing errors

Alternately, if I do the sentiment-analysis pipeline (created by nlp2 = pipeline('sentiment-analysis'), I did not get the error.

Is there a way for me put an argument in the pipeline function to make it truncate at the max model input length? I tried reading this, but I was not sure how to make everything else in pipeline the same/default, except for this truncation.

Topic		Replies	Views
How to specify sequence length when using "feature-extraction" 🤗Transformers	3	1302	April 28, 2021
Tokenizer behaviour with pipeline 🤗Tokenizers	0	930	August 1, 2023
How do I setup a TextClassificationPipeline that truncates token sequences Beginners	0	330	September 29, 2021
Predictions with pipeline fails to truncate test set 🤗Transformers	0	181	January 23, 2024
Out of index error in pipeline Beginners	9	6534	June 22, 2022

Truncating sequence -- within a pipeline

Related topics