Gradient accumulation: should I duplicate data?

marcoabrate · January 13, 2021, 11:25am

Hey @BramVanroy, thank you for your reply. I have found the notebook, sorry for not being very precise. It’s Reformer - Pushing the Limits of Language Modeling. Around box 7 it says:

We then expand the same sample to 8 training samples so that we can accumulate gradients during training.

In the code

  # duplicate data 8 times to have have 8 examples in dataset
  for key in input_ids_dict.keys():
    input_ids_dict[key] = [8 * [x] for x in input_ids_dict[key]][0]

And gradient accumulation steps is actually 4, not 8 as I would expect. With batch size 1.

Topic		Replies	Views
Questions about steps with gradient accumulation Beginners	1	1044	July 19, 2023
Any incompatibility of gradient_accumulation with the streaming data? 🤗Transformers	0	259	July 10, 2023
Performing gradient accumulation with Accelerate 🤗Accelerate	3	614	March 4, 2024
Bug in gradient accumulation training_step in huggingface Trainer? 🤗Transformers	3	1069	November 2, 2024
Custom gradient accumulation scheme in Trainer 🤗Transformers	0	342	June 23, 2023

Gradient accumulation: should I duplicate data?

Related topics