Why the model loading of llama2 is so slow?

anxiangyinyun · July 24, 2023, 1:48pm

It took me about 1 hour to load the model of llama2-7b-hf. It’s such weird. What can I do to resolve this issue?
The code is attached as follows:

from transformers import AutoModelForCausalLM
model_dir = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
        model_dir,
        local_files_only=True,
        torch_dtype=torch.float16,
        device_map='auto'
    )

anxiangyinyun · July 26, 2023, 9:00am

Issue solved. It’s the disk problem, I copy the model to a “close” disk and the loading time reduce to 7~8 minutes.

vartu · August 11, 2023, 8:32pm

Can you explain what “close” disk refers to. Actually I am also facing the similar kind of issue. I am using ml.g5.12xlarge to infer llama2 model. I downloaded the model locally using snapshot_download method. But model loading is taking more than 30 minutes.

SUNM · August 13, 2023, 8:41am

hi @philschmid, I hope you are doing well. Sorry for fine tuning llama2, I create csv file with the Alpaca structure which has text column including ### instruction ### input ### response, for fine tuning the model I am confused which method with PEFT and QLora should I use, I am confused with many codes, would you please refer me to any code that is right for fine tuning with alpaca structure, and saving and inference for testing the model? In some code I saw they did tokenizer truncate and padding and refer label to -100 and in other no preprocessing is done. I appreciate your help. Many thanks.

teodor98 · January 23, 2024, 7:18am

Hey, did you find a solution? I was loading the model for max 3min and out of nowhere it takes more than 30 mins.

Thanks

lepotatoguy · January 29, 2024, 3:41pm

I solved this by turning local_files_only to true. (Note: I had previously downloaded LLaMA 2)
From loading in around 3.2 hours, it loaded within around 29 seconds after changing this.

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    local_files_only = True
)

saikumar305 · April 24, 2024, 7:42am

@lepotatoguy , can you please provide bnb_config and device_map values in details?

Topic		Replies	Views
Loading a locally saved model is very slow 🤗Transformers	1	3746	July 10, 2024
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1495	June 23, 2024
Models slow on M1 Pro 16gb Beginners	0	729	December 18, 2023
Llama2 response times - feedback Beginners	0	621	February 6, 2024
meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API 🤗Transformers	1	115	February 2, 2025

Why the model loading of llama2 is so slow?

Related topics