Loading a retrained model locally

elifnurd · February 5, 2024, 1:51pm

Hi there, I am using huggingface to test different LLMs. I am a bit unclear on what happens when I use for example

llama_model_id = ‘meta-llama/Llama-2-7b-chat-hf’
llama_tokenizer = AutoTokenizer.from_pretrained(llama_model_id)
llama_model= AutoModelForCausalLM.from_pretrained(
llama_model_id,
torch_dtype = torch.bfloat16,
device_map=‘auto’)

The first time executing this code, it loads some time but afterwards it is very quick to ‘load the checkpoint shards’. What is meant by that? Is this running locally so I will be using the version from the first download or is it still updating itself every time I execute this code? I want to use only one version for some automated prompt testing and was hoping to do this locally so there are no changes to the version/ code.

Initially, I thought it would be running locally, but all of a sudden the prompt results have changed drastically, which is why I am asking. Thanks in advance!

nielsr · February 5, 2024, 8:38pm

The first time you run from_pretrained, it will load the weights from the hub into your machine, and store them in a local cache. This means that when rerunning from_pretrained, the weights will be loaded from your cache.

system · February 11, 2024, 9:37am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
.from_pretrained($local_path) downloading already fine-tuned model instead of loading the model locally Beginners	0	412	August 22, 2024
Why does local-downloaded model files are different from those in huggingface? Beginners	5	719	June 10, 2024
meta-llama/Llama-3.2-11B-Vision-Instruct did not reply 🤗Transformers	10	12909	October 29, 2024
Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face? Beginners	1	23	June 26, 2025
AutoModelForCausalLM() to HuggingFaceLLM Beginners	2	2964	October 4, 2024

Loading a retrained model locally

Related topics