Hi there, I am using huggingface to test different LLMs. I am a bit unclear on what happens when I use for example
llama_model_id = ‘meta-llama/Llama-2-7b-chat-hf’
llama_tokenizer = AutoTokenizer.from_pretrained(llama_model_id)
llama_model= AutoModelForCausalLM.from_pretrained(
llama_model_id,
torch_dtype = torch.bfloat16,
device_map=‘auto’)
The first time executing this code, it loads some time but afterwards it is very quick to ‘load the checkpoint shards’. What is meant by that? Is this running locally so I will be using the version from the first download or is it still updating itself every time I execute this code? I want to use only one version for some automated prompt testing and was hoping to do this locally so there are no changes to the version/ code.
Initially, I thought it would be running locally, but all of a sudden the prompt results have changed drastically, which is why I am asking. Thanks in advance!