I am looking to understand the math behind how max steps is calculated when left alone, I’ve tried to work backwards from making changes to epoch, batch, and micro-batch to see if I could figure out the formula but I haven’t had any luck. I have also looked at the documentation for the transformers trainer but I haven’t been able to find it, I’m looking to be able to predict the max steps using the knowledge of epochs, batch, micro-batch, and how many samples are in the dataset.

hopefully I’m just missing something simple thanks for your help!

As a newbie, I was just trying to figure out these relations. I don’t know about micro-batching, but I wanted epoch-based calculations.

```
evaluation_strategy = "epoch",
save_strategy = "epoch",
per_device_train_batch_size = 64,
gradient_accumulation_steps = 4,
per_device_eval_batch_size = 64,
```

My train set was 31,091 these were giving me 605 steps. So I “back-propagated”…

```
epochs = 5
train_size = 31091
train_batch_size = 64
ga_steps = 4
virtual_batch_size = train_batch_size * ga_steps # "invented name" => 256
per_epoch_steps = int(train_size / virtual_batch_size + 0.5) # round => 121
total_steps = epochs * per_epoch_steps # => 605
```

If I’m not mistaken…