I have script which runs on a GPU, and iterates over a number of text chunks of varying size, and uses the pipeline
summarization
module to return a single sentence summary of such inputs.
The expected behavior of this tool is obeyed for a much of the loop, but eventually breaks on some data point, and then the pipeline fails for all subsequent data point.
Essentially, I am wondering how I might go about keeping this script from failing over the entire loop, even if it fails on a single sample?
The following is a snippet of the code used, and error received.
summarizer = pipeline("summarization","lidiya/bart-large-xsum-samsum",device=0) # Bart XSUM
for paragraph in paragraphs:
summary = summarizer(paragraph)[0]['summary_text']
When on a GPU this error occurs for all instances after the first failure:
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
And when I run on CPU the much slower script works, and only fails for a single data point, and this appears to be what causes the trouble.
index out of range in self
Is there a way to reset the GPU after a failure in the pipeline and return to expected behavior?
Thank you!