Hello all, I am loading a llama 2 model from HF hub on g5.48xlarge on sagemaker notebook instance using the below commands and it take less than 5 minutes to completes the loading
model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_8bit=True, torch_dtype=torch.float16)
Thereafter I save the model locally using the below code
save_directory = "models/Llama-2-7b-chat-hf"
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)
then I try loading the locally saved model using the below code
model = AutoModelForCausalLM.from_pretrained(save_directory, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(save_directory)
but it took around 30 minutes to load the same model from local as compared to less than 5 minute from HF hub, what can be the reason behind it and how can I load the model faster.
Any answer would be appreciated, Thanks.