Multinode worse performance than single node with same settings

Ok I fixed this issue

the random seed was set to the same value on each GPU

this meant that all the dropout masks were the same on each device which led to large and funky gradients

I think this is a gotcha for people and maybe you should handle this internally?

1 Like