How to load checkpoint shards with gaudi instead of cpu?

gildesh · August 21, 2023, 7:39am

While fine-tuning llama2 70-B, we ran into memory issues without distributed computing. It’s somehow, expecting a node address and we can’t pass the node address as an environment variable.

So, we tried to acceelrate the CPUs by using the HPUs. But while doing that, we were failing to load checkpoint shards. How to load checkpoint shards with gaudi instead of cpu? We need to accelerate our CPUs with HPUs and load the checkpoiints shard with the device.

@regis sorry for the barrage of questions!

regisss · August 21, 2023, 12:49pm

@gildesh

It’s somehow, expecting a node address and we can’t pass the node address as an environment variable.

This should be only if you want to use several nodes. If you have a single instance (8 devices), no need to specify any node address.

Loading checkpoint shards should work with DeepSpeed, not sure without. Could you give me a command so that I can reproduce it?

Topic		Replies	Views
Load_checkpoint_and_dispatch without heavy system memory usage 🤗Accelerate	1	3076	April 10, 2023
Transformers Trainer + Accelerate FSDP: How do I load my model from a checkpoint? 🤗Accelerate	3	14390	June 22, 2025
Deepspeed ZeRO Inference DeepSpeed	1	2730	November 24, 2021
Load a single GPU checkpoint to 2 GPUS (deepspeed) Intermediate	0	1997	June 29, 2022
Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO Intermediate	1	2420	March 19, 2024

How to load checkpoint shards with gaudi instead of cpu?

Related topics