Gradio app runtime error

jattokatarratto · October 11, 2024, 5:27pm

Hi All,
I have developed a easy gradio app that load my finetuned model (from Mistral OpenORca 7B).
It works locally on my server, but when I launch it huggingface, with the CPU basic (2 VCPU 16GB) I get the error on:

tokenizer = AutoTokenizer.from_pretrained(
ie_model_id,
token=usr_tkn,
use_fast=False
)

    model = AutoModelForCausalLM.from_pretrained(
        ie_model_id,
        token=usr_tkn,
        torch_dtype=torch.bfloat16,
        )

I get:
runtime error
Memory limit exceeded (16Gi)

do you have any guesses how I can optimize to avoid this error?

Best,
Sergio

John6666 · October 11, 2024, 10:25pm

There is only 16GB of RAM in the CPU space, so I think the model you are trying to use is simply too big to have enough RAM.
Let’s try a smaller model first.

Topic		Replies	Views
GPU quota exceeded even when using access token from PRO Beginners	1	1396	June 7, 2024
Incapable to use ZeroGPU resource via Hugging Face Pro quota with gradio api Spaces	1	81	December 24, 2024
Runtime Error in Paid Space after upgrade and no info Spaces	15	605	May 30, 2024
Using gpt-j-6B in a CPU space without the InferenceAPI Spaces	0	2284	January 28, 2022
HF runtime error. Memory limit exceeded Spaces	0	214	May 23, 2024

Gradio app runtime error

Related topics