Hi there–newbie question. I’m training my first model (GPT2) and the trainer pauses every 500 steps to save the checkpoint. My impression is that this slows down training a lot. Am I right/wrong about that? And, more importantly, are there community best practices I should know about that maximize speed but keep good logs?
Thanks for your help,
S