Ideally gradient accumulation has nothing to do with data . It’s basically , in storage memory of few epochs and then do gradient update, which will have an effect of larger batch size.
1 Like
Ideally gradient accumulation has nothing to do with data . It’s basically , in storage memory of few epochs and then do gradient update, which will have an effect of larger batch size.