Riiight, excellent. I used max_steps and didn’t notice the epoch is double for 2 GPUs. Awesome.
While we are at it. I read Thomas’ article about GPU training where he advocates computing the loss in a parallel fashion. But I don’t see support for it in the trainer. It wasn’t worth it?