Hi All,
I have developed a easy gradio app that load my finetuned model (from Mistral OpenORca 7B).
It works locally on my server, but when I launch it huggingface, with the CPU basic (2 VCPU 16GB) I get the error on:
tokenizer = AutoTokenizer.from_pretrained(
ie_model_id,
token=usr_tkn,
use_fast=False
)
model = AutoModelForCausalLM.from_pretrained(
ie_model_id,
token=usr_tkn,
torch_dtype=torch.bfloat16,
)
I get:
runtime error
Memory limit exceeded (16Gi)
do you have any guesses how I can optimize to avoid this error?
Best,
Sergio