Gradient accumulation: should I duplicate data?

patrickvonplaten · February 1, 2021, 6:47am

Yeah this might have been a bit unprecise in the notebook. So the reason I’m expanding the training data from 1 to 8 samples is a super edge-case. Since Reformer processes the whole train dataset in 1 batch, there is only one data sample in the whole dataset. Then if one uses gradient_accumulation (which as pointed out correctly has nothing to do with data replication) there is a bug if the data set is of size 1 because the training script rightfully expects the dataset to at least have > 1 training samples when gradient_accumulation is used. So my solution of expanding the dataset is more of a hack than the recommended way of doing it (actually one should never copy samples from the dataset).
I doubt there is any real application of having a dataset of batch 1 => this notebook was more of a show-off that Reformer can process the whole dataset in 1 batch, so not super relevant for real scenarios.

Topic		Replies	Views
Questions about steps with gradient accumulation Beginners	1	1044	July 19, 2023
Any incompatibility of gradient_accumulation with the streaming data? 🤗Transformers	0	259	July 10, 2023
Performing gradient accumulation with Accelerate 🤗Accelerate	3	614	March 4, 2024
Bug in gradient accumulation training_step in huggingface Trainer? 🤗Transformers	3	1069	November 2, 2024
Custom gradient accumulation scheme in Trainer 🤗Transformers	0	342	June 23, 2023

Gradient accumulation: should I duplicate data?

Related topics