I am trying to train a Roberta on a large corpus with a server with time limitation.
Is there any way to save the model like every 3000 steps to keep record of the training, and resume it later?
Really need it with the project…Thanks for helping.
you can set it in trainig config:
( save_steps (
int , optional, defaults to 500) – Number of updates steps before two checkpoint saves.)
i.e. “save_steps”: 3000