'CUDA error: all CUDA-capable devices are busy or unavailable" when using

When I try to run the “http://api-inference.huggingface.co/gpu” interface, I get the error

{'error': 'CUDA error: all CUDA-capable devices are busy or unavailable\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.'}

The same code works with http://api-inference.huggingface.co/cpu

Anything I am missing to use accelerated inference with gpu?