Explicitly set number of training steps using Trainer

I’m using Trainer to handle finetuning a GPT2 model. I see in TrainingArguments there is a max_steps that overrides num_train_epochs.

For a batch size of 32, is setting max_steps=1000000 the equivalent of setting num_train_epochs=31250?

Also, what happens if I have a batch size of 6 but want to set max_steps=1000000? Does Trainer stop training after the nearest divisible whole number or does it change the batch size at the end?

Thanks in advance!!

The number of steps is the number of update steps. It’s not the number of training examples to be seen.

Ok. Is it then the case that for a batch size of 32, setting max_steps=1000000 is the same as setting num_train_epochs=31250 ?

No (unless your dataset has 32 batches so is of length 32*32). num_train_epochs = max_steps / len(train_dataloader) in general.

Apologies for my confusion. Is then max_steps the maximum number of times a batch makes a forward and backward pass through the network?

Not exactly it is the number of update steps (so if you’re using gradient accumulation, it’s a bit different than just doing forward and backward).

1 Like