Gradient accumulation: should I duplicate data?

Ideally gradient accumulation has nothing to do with data . It’s basically , in storage memory of few epochs and then do gradient update, which will have an effect of larger batch size.

1 Like