Gradio app runtime error

Hi All,
I have developed a easy gradio app that load my finetuned model (from Mistral OpenORca 7B).
It works locally on my server, but when I launch it huggingface, with the CPU basic (2 VCPU 16GB) I get the error on:

tokenizer = AutoTokenizer.from_pretrained(
ie_model_id,
token=usr_tkn,
use_fast=False
)

    model = AutoModelForCausalLM.from_pretrained(
        ie_model_id,
        token=usr_tkn,
        torch_dtype=torch.bfloat16,
        )

I get:
runtime error
Memory limit exceeded (16Gi)

do you have any guesses how I can optimize to avoid this error?

Best,
Sergio

There is only 16GB of RAM in the CPU space, so I think the model you are trying to use is simply too big to have enough RAM.
Let’s try a smaller model first.