Gradient accumulation: should I duplicate data?

Yeah this might have been a bit unprecise in the notebook. So the reason I’m expanding the training data from 1 to 8 samples is a super edge-case. Since Reformer processes the whole train dataset in 1 batch, there is only one data sample in the whole dataset. Then if one uses gradient_accumulation (which as pointed out correctly has nothing to do with data replication) there is a bug if the data set is of size 1 because the training script rightfully expects the dataset to at least have > 1 training samples when gradient_accumulation is used. So my solution of expanding the dataset is more of a hack than the recommended way of doing it (actually one should never copy samples from the dataset).
I doubt there is any real application of having a dataset of batch 1 => this notebook was more of a show-off that Reformer can process the whole dataset in 1 batch, so not super relevant for real scenarios.