Loading a locally saved model is very slow

anon57993558 · July 10, 2024, 2:53pm

Hi there, did you ever find a solution for this? - having the same issues here. Have run this code:
config = AutoConfig.from_pretrained(storage_model_path)

tokenizer = AutoTokenizer.from_pretrained(storage_model_path)

model = AutoModelForCausalLM.from_pretrained(
storage_model_path,
config=config,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
local_files_only=True,
)

And it still takes 30 mins as opposed to 45 seconds when loading from the hub directly.

Env requirements:
transformers==4.41.2
torch==2.2.2
requests==2.31.0
accelerate==0.31.0

Using Databricks 14.3 ML cluster with cuda version 11.8 - not sure if its a read bits per second setting or on the transformers side?

Would appreciate if anyone has a fix for this?

Topic		Replies	Views
Why the model loading of llama2 is so slow? 🤗Transformers	6	9423	April 24, 2024
Loading a retrained model locally Beginners	2	2385	February 5, 2024
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1489	June 23, 2024
Loading pre-trained models with AddedTokens 🤗Transformers	2	727	October 14, 2024
meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API 🤗Transformers	1	108	February 2, 2025

Loading a locally saved model is very slow

Related topics