Load_checkpoint_and_dispatch checkpoint value error using Sagemaker

Hi, I am following the CogVLM Model. https://huggingface.co/THUDM/cogvlm-chat-hf

I have spinup at sagemaker instance of ml.g4dn.12xlarge with 4x16 GPU.

I tried follow the code specified in above huggingface link, but face error at the load_checkpoint_and_dispatch

model = load_checkpoint_and_dispatch(
“~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/54b93e0af3f1d8badcdeefdb0d26b1dfbc227f7a/”, # typical, ‘~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/balabala’

This is the results of the !ls ~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/54b93e0af3f1d8badcdeefdb0d26b1dfbc227f7a/

config.json model-00005-of-00008.safetensors
configuration_cogvlm.py model-00006-of-00008.safetensors
generation_config.json model-00007-of-00008.safetensors
model-00001-of-00008.safetensors model-00008-of-00008.safetensors
model-00002-of-00008.safetensors modeling_cogvlm.py
model-00003-of-00008.safetensors model.safetensors.index.json
model-00004-of-00008.safetensors visual.py

This is the error message
ValueError: checkpoint should be the path to a file containing a whole state dict, or the index of a sharded checkpoint, or a folder containing a sharded checkpoint or the whole state dict, but got ~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/54b93e0af3f1d8badcdeefdb0d26b1dfbc227f7a/.

Have you tried replacing it for a full path instead of a relative one?

1 Like

Hii @Soraheart1988, were you able to solve your issue ? as @tomasremitz said, you should probably try to put the absolute path.

Just for reference to other folks. The following worked for me:

path_to_index = '/root/.cache/.../model.safetensors.index.json'
model = load_checkpoint_and_dispatch(

Notice the /root/....

Thanks all, manage to solve it by using the absolute path.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.