Loading a retrained model locally

Hi there, I am using huggingface to test different LLMs. I am a bit unclear on what happens when I use for example

llama_model_id = ‘meta-llama/Llama-2-7b-chat-hf’
llama_tokenizer = AutoTokenizer.from_pretrained(llama_model_id)
llama_model= AutoModelForCausalLM.from_pretrained(
torch_dtype = torch.bfloat16,

The first time executing this code, it loads some time but afterwards it is very quick to ‘load the checkpoint shards’. What is meant by that? Is this running locally so I will be using the version from the first download or is it still updating itself every time I execute this code? I want to use only one version for some automated prompt testing and was hoping to do this locally so there are no changes to the version/ code.

Initially, I thought it would be running locally, but all of a sudden the prompt results have changed drastically, which is why I am asking. Thanks in advance!

The first time you run from_pretrained, it will load the weights from the hub into your machine, and store them in a local cache. This means that when rerunning from_pretrained, the weights will be loaded from your cache.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.