I have a deployment in my Space. But the GPU is not being used. I installed CUDA 11.8 and and I am using torch.
Is it zero GPU space?
If so, you’ll need to follow a slightly different procedure.
If not, that’s strange. Maybe you forgot to .to(“cuda”) the model or pipeline.
Hi John,
Thank you for the quick response.
When i send the request to the inference i see that the CPU is being used but GPU is not getting picked up although i have included .to(“cuda”). I am using the NVIDIA 1xL4. Just a snapshot below. When request is sent CPU touches 44% but GPU is not used.
It’s getting weirder and weirder…
The only thing that’s unusual is that the CUDA version is much older than the 12.4 that is common in HF, but I think it works even if it’s older…
And if there’s not enough VRAM, it should offload properly if accelerate is installed.
I think trying to load some unrelated model into the GPU might help isolate the problem.