[HELP] CUDA Error: out of memory with facebook/bart-large-mnli

I’m on the organization lab plan and trying to use GPU-accelerated inference for facebook/bart-large-mnli model. I am using this model for text classification and passing 10 candidate labels. When using GPU-Accelerated Inference I am getting error 400 Bad request. It works without GPU but the latency is not acceptable.
This is the error message -

{
"error": "CUDA error: out of memory\nCUDA kernel errors might be asynchronously reported at****some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1."
}

Any leads on this would be really helpful.

Hi! Any updates on this one?
Facing the same error when using “facebook/bart-large” on a translation task.

Any updates here?