Measuring training speed

streetsonthemountain · November 29, 2022, 8:58pm

Hey folks,

I am pre-training a Roberta model on the c4 corpus on a g5.xlarge ec2 instance using pytorch and an Adam8bit optimizer. The instance has an A10g that is rated at 30 tflops (NVIDIA A10G Specs | TechPowerUp GPU Database).

How do I estimate how well I am using the GPU? Here is a training run on wandb: Weights & Biases

Dividing total_flos by total time gives approximate 6 tera flops or about 20% utilization which feels low.

Using the formula 6ND also gives a number in the same ballpark.

Am I thinking about this right? Pointers much appreciated.

Cheers,
–sr

Topic		Replies	Views
RoBERTa training low GPU utilization 🤗Transformers	6	4014	July 3, 2021
Finetuning ByT5 with a batch size of 1 on T4 GPU 🤗Transformers	0	590	June 30, 2022
Switch batch size and gradient accumulation step values mid training Beginners	0	238	February 28, 2024
SFTTrainer training very slow. Is this training speed expected? Beginners	4	249	February 8, 2025
SFTTrainer training very slow on GPU. Is this training speed expected? 🤗Transformers	4	294	February 8, 2025

Measuring training speed

Related topics