Dataloader fetches slowly using accelerator for distributed training

ezio98 · October 29, 2021, 10:42am

Hi, I am using multiple GPUs by accelerate. The problem I encounter is that if I maintain the batch size on each GPU, the training time per step increases shaply as more GPUs are included into training. Then I break down the time and locate at fetching data batch from dataloader. The times are 0.01s/ite,0.09s/ite, and 0.2s/ite when I use 1, 2 and 4 GPUs, respectively. This harms the efficiency since if I set the accumulation_step as 8, there would be around 1.6s spent on fetching data for 4-GPU. This device works fine when I use the transformers.trainer for distributed training. Therefore, I think the device works fine, maybe I mistake something so that this problem happens. Anyone has idea how to solve it? Looking forward to your reply!

Topic		Replies	Views
Does Trainer prefetch data? Beginners	3	2318	February 13, 2023
Inconsistent Training Time with Accelerate 🤗Accelerate	0	30	November 8, 2024
Multiple GPUs do not speed up the training 🤗Accelerate	1	3447	January 26, 2022
Can accelerator handle the distributed sampler? 🤗Accelerate	2	2945	December 21, 2021
Worse performance using Accelerate 🤗Accelerate	0	1048	January 15, 2024

Dataloader fetches slowly using accelerator for distributed training

Related topics