Why the model loading of llama2 is so slow?

It took me about 1 hour to load the model of llama2-7b-hf. It’s such weird. What can I do to resolve this issue?
The code is attached as follows:

from transformers import AutoModelForCausalLM
model_dir = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(

Issue solved. It’s the disk problem, I copy the model to a “close” disk and the loading time reduce to 7~8 minutes.

Can you explain what “close” disk refers to. Actually I am also facing the similar kind of issue. I am using ml.g5.12xlarge to infer llama2 model. I downloaded the model locally using snapshot_download method. But model loading is taking more than 30 minutes.

