Truncating sequence -- within a pipeline

Hi all,

Thanks for making this forum!

I have a list of tests, one of which apparently happens to be 516 tokens long. I have been using the feature-extraction pipeline to process the texts, just using the simple function:

nlp = pipeline('feature-extraction')

When it gets up to the long text, I get an error:

Token indices sequence length is longer than the specified maximum sequence length for this model (516 > 512). Running this sequence through the model will result in indexing errors

Alternately, if I do the sentiment-analysis pipeline (created by nlp2 = pipeline('sentiment-analysis'), I did not get the error.

Is there a way for me put an argument in the pipeline function to make it truncate at the max model input length? I tried reading this, but I was not sure how to make everything else in pipeline the same/default, except for this truncation.

3 Likes