I’m trying to download a model (Llama 70B) that I have access to:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
# load_in_8bit=True,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
# rope_scaling = {"type": "dynamic", "factor": scaling_factor}
)
I have access to this model (I know because I can successfully download the model to other hosts on my cluster). When I try downloading on one particular machine, I see the following:
Downloading config.json: 100%|████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<00:00, 2.71MB/s]
The process then becomes immediately unresponsive. I tried waiting overnight to no avail. I’ve tried on other days without success.
Why is this happening and how do I solve it?
As a possibly relevant observation, I think I interrupted a previous download of this model. I can’t quite remember. I’ve already tried clearing .cache/huggingface/hub
but maybe I need to clear/reset something else?