Can I resume training from a model that's been pushed to the hub?

benjamintli · December 26, 2024, 4:15pm

Hey! I have a gpt2 model I’m pretraining, and (due to being GPU poor) I’m using kaggle and running the script in the background. I trained it for 2 epochs, and I wanna continue more, but by virtue of it being on Kaggle the directories don’t get persisted between runs. Is it possible to continue training based on what’s in my huggingface repo that the model got pushed to, and say continue training from the safetensors file that’s been pushed there?

John6666 · December 27, 2024, 6:38am

Of course it is possible. Just load the model after training, train it again, and upload it.
If you copy the model that has been successfully trained to another repo as a precaution, you can rest assured even if the training fails.

Topic		Replies	Views
How to resume from checkpoint on the hub? Not using trainer api, I'm using TF api Beginners	1	347	February 28, 2023
Continuing model training takes seconds in next round 🤗Transformers	3	1414	June 1, 2023
Continue from pretrained 🤗Transformers	1	739	May 21, 2023
How to continue training a model from where it left off? 🤗Transformers	0	188	September 5, 2024
Trainer .train (resume _from _checkpoint =True) Beginners	9	14792	May 16, 2024

Can I resume training from a model that's been pushed to the hub?

Related topics