Does Trainer prefetch data?

mmalandro · December 13, 2021, 7:57pm

Hi everyone,

I’m pretty new to this. I’m trying to train a transformer model on a GPU using transformers.Trainer.

I’m doing my prototyping at home on a Windows 10 machine with a 4-core CPU with a 1060 gtx. I have my data, model, and trainer all set up, and my dataset is of type torch.utils.data.Dataset. Based on what I see in the task manager, it looks like Trainer might not be prefetching data to keep the GPU busy at all times. Here’s what I see during training:

As you can see, GPU usage maxes out around 55%, and cycles down to 0% regularly during training.

I can iterate through my dataset around 10x faster than it takes for Trainer to train the model for one epoch, so I don’t think data loading is the bottleneck.

So, any ideas why I am seeing this type of behavior with my GPU? Is Trainer not prefetching the data for each training step, or is it some other issue with my code?

mmalandro · December 17, 2021, 6:58pm

In case anyone else is wondering about this, I figured it out.

Trainer indeed appears to prefetch data. The problem was that my data loader was too slow to keep up with the GPU.

After optimizing my data loading routine, I’m able to keep the GPU busy constantly.

Hannan · February 11, 2023, 10:56am

I have such a problem of fluctuation of GPU utilization. Could you please explain more what you have done to optimize your data loading routine?

mmalandro · February 13, 2023, 6:38am

My problem was I was doing data augmentations and tokenization on the fly during training, which was too slow to keep up with the GPU. My solution was to do the augmentations and tokenization ahead of time (on CPU) and write the results to temp files, and then for training I would load up the GPU and the temp files.

Topic	Replies	Views
How to get the Trainer API to use GPU? Beginners	1563	May 21, 2021
Dataloader fetches slowly using accelerator for distributed training 🤗Accelerate	1205	October 29, 2021
Training Transformer doesn't reach full GPU usage 🤗Transformers	532	February 10, 2023
Does Trainer use multiple workers on datasets? 🤗Transformers	528	July 13, 2023
Finetuning and single-GPU utilization 🤗Transformers	489	August 19, 2021

Does Trainer prefetch data?

Related topics