Save custom objects in the state for each process

aps · September 20, 2022, 12:52am

As of now custom registered objects are not saved per process as seen here. Is there a way to save a separate checkpoint and load for each of the register custom objects per process/rank in a distributed setting?

Alternatively, do you suggest that we should gather everything before saving and then distribute during loading?

sgugger · October 3, 2022, 3:30pm

I’m not fully understanding how it’s not supported by just saving what you want to save outside of Accelerate right now (so each process will save its own version). Could you tell us more about your use case?

SaulLu · October 4, 2022, 12:51pm

Thanks for the answer!

I’m not fully understanding how it’s not supported by just saving what you want to save outside of Accelerate right now (so each process will save its own version).

Yes, I think it should work. We did not take this strategy and chose to register_for_checkpointing the object with accelerate.

Could you tell us more about your use case?

Sure, in our case, we have a custom object that we use to follow the progress of the dataloader on each rank and each worker id so that we can resume the training where it was stopped the previous time.

sgugger · October 4, 2022, 1:28pm

Ok, so in this case it does seem easier to not register the object for checkpointing and save/load it manually (using the process index in the name of the save somehow, so you know which saved file to pick when reloading).

SaulLu · October 4, 2022, 1:50pm

Thanks a lot for your feedback!

Topic		Replies	Views
What is the right way to save check point using accelerator while trainining on multiple gpus? 🤗Accelerate	2	1975	January 24, 2024
How to save everything in one checkpoint? 🤗Accelerate	2	1517	March 17, 2023
Why is `accelerator.save` saving once for each node? 🤗Accelerate	2	620	August 31, 2022
Saving optimizer 🤗Accelerate	19	6671	May 18, 2023
Is the Trainer supposed to be saving checkpoints for every process? Beginners	0	11	July 20, 2024

Save custom objects in the state for each process

Related topics