Load_checkpoint_and_dispatch checkpoint value error using Sagemaker

Soraheart1988 · December 6, 2023, 6:35am

Hi, I am following the CogVLM Model. https://huggingface.co/THUDM/cogvlm-chat-hf

I have spinup at sagemaker instance of ml.g4dn.12xlarge with 4x16 GPU.

I tried follow the code specified in above huggingface link, but face error at the load_checkpoint_and_dispatch

Blockquote
model = load_checkpoint_and_dispatch(
model,
“~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/54b93e0af3f1d8badcdeefdb0d26b1dfbc227f7a/”, # typical, ‘~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/balabala’
device_map=device_map,
)

This is the results of the !ls ~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/54b93e0af3f1d8badcdeefdb0d26b1dfbc227f7a/

config.json model-00005-of-00008.safetensors
configuration_cogvlm.py model-00006-of-00008.safetensors
generation_config.json model-00007-of-00008.safetensors
model-00001-of-00008.safetensors model-00008-of-00008.safetensors
model-00002-of-00008.safetensors modeling_cogvlm.py
model-00003-of-00008.safetensors model.safetensors.index.json
model-00004-of-00008.safetensors visual.py

This is the error message
ValueError: checkpoint should be the path to a file containing a whole state dict, or the index of a sharded checkpoint, or a folder containing a sharded checkpoint or the whole state dict, but got ~/.cache/huggingface/hub/models–THUDM–cogvlm-chat-hf/snapshots/54b93e0af3f1d8badcdeefdb0d26b1dfbc227f7a/.

tomasremitz · December 26, 2023, 3:23pm

Hi!
Have you tried replacing it for a full path instead of a relative one?

marcsun13 · January 24, 2024, 3:30pm

Hii @Soraheart1988, were you able to solve your issue ? as @tomasremitz said, you should probably try to put the absolute path.

silverquimera · March 28, 2024, 7:09am

Just for reference to other folks. The following worked for me:

path_to_index = '/root/.cache/.../model.safetensors.index.json'
model = load_checkpoint_and_dispatch(
    model,
    path_to_index,   
    device_map=device_map,
)

Notice the /root/....

Soraheart1988 · March 28, 2024, 8:21am

Thanks all, manage to solve it by using the absolute path.

system · March 28, 2024, 8:21pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Specifying path where Sagemaker download the model Amazon SageMaker	0	435	December 6, 2023
ValueError: Could not load model /opt/ml/model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>) Amazon SageMaker	0	389	March 13, 2024
FSDP training not saving the best checkpoint and load from checkpoint fails 🤗Transformers	0	771	January 23, 2024
ValueError in deploying HuggingFace Model to Sagemanker with Multiple Lora Adapters Beginners	1	72	October 21, 2024
Issue - ValueError: Unsupported model type mixtral Amazon SageMaker	1	1098	December 28, 2023

Load_checkpoint_and_dispatch checkpoint value error using Sagemaker

Related topics