Dears.
I am using HF spaces paid plan (ZeroGPU) with persistent storage.
I am running below standard code to load “meta-llama/Llama-3.2-11B-Vision-Instruct” and its processer.
It took 6minutes to get answer on very simple question “who is Donald Trump?” and I tried it many times – it took to much time.
but the existing SPACES that use “meta-llama/Llama-3.2-11B-Vision-Instruct” developed by the community takes 2-3 seconds . what is the problem please
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
The short answer is that you are not using the GPUs in the Zero GPU space, since they are only enabled the moment you explicitly enable them.
I can elaborate a bit more on this if you don’t mind if I take a look at the entire source of the space.
@John6666
as stated in previous image, and even after specifying that MY hardware is using GPU , I tested if GPU is used or not - the answer was
“GPU is not available. The model will use the CPU.”
Any help please… I Am really wasting my time…I could not add new code to finish my task…just stuck in GPU issue.
Check if GPU is available and print message
if torch.cuda.is_available():
print(“GPU is available. The model will use the GPU.”)
else:
print(“GPU is not available. The model will use the CPU.”)
I think I may have found the cause, the T4 has 16GB of VRAM which is not enough for the 11B model, the Zero GPU space can use 40GB of VRAM for a moment so it works with the 11B model.
With 16GB, a 4-bit quantized model should work, so maybe you could try that.
There is also a way to quantize a regular model at load time.
No John – “GPU is not available. The model will use the CPU.” is related to UNAVIALBILITY of CUDA library.
I could solve the problem on my local laptop by installing needed pytorch and cuda libraries…
but for HF spaces, these CUDA libraries should be installed by default on the VM/Container…I dont have access to the VM/Container to install such files…
can you share it with other huggingface people who can help
Sorry, I forgot to explain the basic use of the Zero GPU space.
Zero GPU space makes CUDA invisible except to functions with @spaces.GPU decorators or global scope.