I want to save the checkpoints directly to my google drive. The problem is the code above saves my checkpoints upto to save limit all well. But after the limit it can’t delete or save any new checkpoints. Although it says checkpoints saved/deleted in the console. Any help?
This could be a solution. But what if my runtime gets disconnected while training. My checkpoints will be lost then. So I actually need to have the checkpoints in my drive after the save steps.
Try this , after every save step use interrupt execution in collab and save the checkpoint using this , then restart training from the saved checkpoint
Can anyone please tell me on how to start the training of transformer from where it had left by loading the previously saved checkpoints. It would be really appreciated. Also thanks in advance.