Exception from Summarization Network

rbgreenway · February 9, 2023, 5:00pm

I’ve been experimenting with a summarization pipeline:
summarizer = pipeline(“summarization”, model=“facebook/bart-large-cnn”, device=0)

It works wonderfully, however, there are occasions where I get an exception that is apparently caused by submitting something too large?..at least that’s what the error message seems to suggest:

Token indices sequence length is longer than the specified maximum sequence length for this model (1068 > 1024). Running this sequence through the model will result in indexing errors
/opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [420,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.

My questions are:
1 - is this indeed caused by submitted a string with too many words?
2 - what is the “sequence length” and how to I calculate it on a string before submitting it for summarization?

Thanks!!

Topic		Replies	Views
Increasing Token Limits for long strings for knkarthickMEETING_SUMMARY Beginners	0	550	November 9, 2022
Truncating sequence -- within a pipeline Beginners	7	5784	May 3, 2024
Getting error even after setting the max_length Beginners	1	2046	November 30, 2023
Token indices sequence length is longer than the specified maximum sequence length 🤗Tokenizers	4	23027	February 15, 2023
How can i supress this warning Beginners	2	294	September 24, 2024

Exception from Summarization Network

Related topics