[HELP] CUDA Error: out of memory with facebook/bart-large-mnli

anansarah · May 26, 2022, 9:08pm

I’m on the organization lab plan and trying to use GPU-accelerated inference for facebook/bart-large-mnli model. I am using this model for text classification and passing 10 candidate labels. When using GPU-Accelerated Inference I am getting error 400 Bad request. It works without GPU but the latency is not acceptable.
This is the error message -

{
"error": "CUDA error: out of memory\nCUDA kernel errors might be asynchronously reported at****some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1."
}

Any leads on this would be really helpful.

sheoran95 · March 8, 2023, 9:15am

Hi! Any updates on this one?
Facing the same error when using “facebook/bart-large” on a translation task.

LidorPrototype · May 2, 2023, 4:00am

Any updates here?

Topic		Replies	Views
Cuda out of memory error when using Inference API 🤗Hub	0	946	August 11, 2022
CUDA out of memory error while predicting (evaluation) 🤗Transformers	1	1355	March 22, 2024
Is this CUDA memory error on Inference API coming from HuggingFace or Google Collab? Beginners	0	608	July 20, 2021
RuntimeError: CUDA out of memory even with simple inference Beginners	1	5376	January 16, 2022
Llama2 70b - Cuda out of memory exceptions 🤗Transformers	0	154	February 28, 2024

[HELP] CUDA Error: out of memory with facebook/bart-large-mnli

Related topics