While fine-tuning llama2 70-B, we ran into memory issues without distributed computing. It’s somehow, expecting a node address and we can’t pass the node address as an environment variable.
So, we tried to acceelrate the CPUs by using the HPUs. But while doing that, we were failing to load checkpoint shards. How to load checkpoint shards with gaudi instead of cpu? We need to accelerate our CPUs with HPUs and load the checkpoiints shard with the device.
@regis sorry for the barrage of questions!