For loop with summarization pipeline?

nkasmanoff · September 20, 2021, 5:43pm

I have script which runs on a GPU, and iterates over a number of text chunks of varying size, and uses the pipeline summarization module to return a single sentence summary of such inputs.

The expected behavior of this tool is obeyed for a much of the loop, but eventually breaks on some data point, and then the pipeline fails for all subsequent data point.

Essentially, I am wondering how I might go about keeping this script from failing over the entire loop, even if it fails on a single sample?

The following is a snippet of the code used, and error received.


summarizer = pipeline("summarization","lidiya/bart-large-xsum-samsum",device=0) # Bart XSUM

for paragraph in paragraphs: 
            summary = summarizer(paragraph)[0]['summary_text']

When on a GPU this error occurs for all instances after the first failure:

CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

And when I run on CPU the much slower script works, and only fails for a single data point, and this appears to be what causes the trouble.

index out of range in self

Is there a way to reset the GPU after a failure in the pipeline and return to expected behavior?

Thank you!

Topic		Replies	Views
Exception from Summarization Network Beginners	0	420	February 9, 2023
Multi GPU fintuning BART 🤗Transformers	3	1649	July 11, 2020
I cannot get batches of data in pipelines Beginners	0	218	May 1, 2023
Reproduce the result of Bart in summarization task Models	0	215	March 17, 2023
Sentences in Abstractive Summarization Beginners	1	492	March 4, 2021

For loop with summarization pipeline?

Related topics