Does Trainer prefetch data?

Hi everyone,

I’m pretty new to this. I’m trying to train a transformer model on a GPU using transformers.Trainer.

I’m doing my prototyping at home on a Windows 10 machine with a 4-core CPU with a 1060 gtx. I have my data, model, and trainer all set up, and my dataset is of type torch.utils.data.Dataset. Based on what I see in the task manager, it looks like Trainer might not be prefetching data to keep the GPU busy at all times. Here’s what I see during training:

r

As you can see, GPU usage maxes out around 55%, and cycles down to 0% regularly during training.

I can iterate through my dataset around 10x faster than it takes for Trainer to train the model for one epoch, so I don’t think data loading is the bottleneck.

So, any ideas why I am seeing this type of behavior with my GPU? Is Trainer not prefetching the data for each training step, or is it some other issue with my code?

1 Like

In case anyone else is wondering about this, I figured it out.

Trainer indeed appears to prefetch data. The problem was that my data loader was too slow to keep up with the GPU.

After optimizing my data loading routine, I’m able to keep the GPU busy constantly.

I have such a problem of fluctuation of GPU utilization. Could you please explain more what you have done to optimize your data loading routine?

My problem was I was doing data augmentations and tokenization on the fly during training, which was too slow to keep up with the GPU. My solution was to do the augmentations and tokenization ahead of time (on CPU) and write the results to temp files, and then for training I would load up the GPU and the temp files.