Batch size vs gradient accumulation

Isnt it the opposite? Using batched input results in higher memory usage and not gradient accumulation. If gradient accumulation is giving an OOM (Out Of Memory Error), it is guranteed that the first one will also give the same error

1 Like