Loading a locally saved model is very slow

shubhampatidar · October 10, 2023, 11:37am

Hello all, I am loading a llama 2 model from HF hub on g5.48xlarge on sagemaker notebook instance using the below commands and it take less than 5 minutes to completes the loading

model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_8bit=True, torch_dtype=torch.float16)

Thereafter I save the model locally using the below code

save_directory = "models/Llama-2-7b-chat-hf"
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

then I try loading the locally saved model using the below code

model = AutoModelForCausalLM.from_pretrained(save_directory, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(save_directory)

but it took around 30 minutes to load the same model from local as compared to less than 5 minute from HF hub, what can be the reason behind it and how can I load the model faster.

Any answer would be appreciated, Thanks.

anon57993558 · July 10, 2024, 2:53pm

Hi there, did you ever find a solution for this? - having the same issues here. Have run this code:
config = AutoConfig.from_pretrained(storage_model_path)

tokenizer = AutoTokenizer.from_pretrained(storage_model_path)

model = AutoModelForCausalLM.from_pretrained(
storage_model_path,
config=config,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
local_files_only=True,
)

And it still takes 30 mins as opposed to 45 seconds when loading from the hub directly.

Env requirements:
transformers==4.41.2
torch==2.2.2
requests==2.31.0
accelerate==0.31.0

Using Databricks 14.3 ML cluster with cuda version 11.8 - not sure if its a read bits per second setting or on the transformers side?

Would appreciate if anyone has a fix for this?

Topic		Replies	Views
Why the model loading of llama2 is so slow? 🤗Transformers	6	9483	April 24, 2024
Loading a retrained model locally Beginners	2	2430	February 5, 2024
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1495	June 23, 2024
Loading pre-trained models with AddedTokens 🤗Transformers	2	749	October 14, 2024
meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API 🤗Transformers	1	115	February 2, 2025

Loading a locally saved model is very slow

Related topics