Can I resume training from a model that's been pushed to the hub?

Hey! I have a gpt2 model I’m pretraining, and (due to being GPU poor) I’m using kaggle and running the script in the background. I trained it for 2 epochs, and I wanna continue more, but by virtue of it being on Kaggle the directories don’t get persisted between runs. Is it possible to continue training based on what’s in my huggingface repo that the model got pushed to, and say continue training from the safetensors file that’s been pushed there?

1 Like

Of course it is possible. Just load the model after training, train it again, and upload it.
If you copy the model that has been successfully trained to another repo as a precaution, you can rest assured even if the training fails.