I’m using Trainer to handle finetuning a GPT2 model. I see in TrainingArguments there is a max_steps that overrides num_train_epochs.
For a batch size of 32, is setting max_steps=1000000 the equivalent of setting num_train_epochs=31250?
Also, what happens if I have a batch size of 6 but want to set max_steps=1000000? Does Trainer stop training after the nearest divisible whole number or does it change the batch size at the end?
Not exactly it is the number of update steps (so if you’re using gradient accumulation, it’s a bit different than just doing forward and backward).