Hello! I am using gradient accumulation to simulate bigger batches when fine-tuning. However, I remember to have seen some notebooks in the documentation where they would make N copies of the data when N is the number of gradient accumulation steps. I do not understand why this should be done. Is t…

Gradient accumulation: should I duplicate data?

s4sarath January 19, 2021, 3:38pm 6

Ideally gradient accumulation has nothing to do with data . It’s basically , in storage memory of few epochs and then do gradient update, which will have an effect of larger batch size.

1 Like

Topic		Replies	Views
Questions about steps with gradient accumulation Beginners	1	1044	July 19, 2023
Any incompatibility of gradient_accumulation with the streaming data? 🤗Transformers	0	259	July 10, 2023
Performing gradient accumulation with Accelerate 🤗Accelerate	3	614	March 4, 2024
Bug in gradient accumulation training_step in huggingface Trainer? 🤗Transformers	3	1069	November 2, 2024
Custom gradient accumulation scheme in Trainer 🤗Transformers	0	342	June 23, 2023

Gradient accumulation: should I duplicate data?

Related topics