Hey folks,
I am pre-training a Roberta model on the c4 corpus on a g5.xlarge ec2 instance using pytorch and an Adam8bit optimizer. The instance has an A10g that is rated at 30 tflops (NVIDIA A10G Specs | TechPowerUp GPU Database).
How do I estimate how well I am using the GPU? Here is a training run on wandb: Weights & Biases
Dividing total_flos by total time gives approximate 6 tera flops or about 20% utilization which feels low.
Using the formula 6ND also gives a number in the same ballpark.
Am I thinking about this right? Pointers much appreciated.
Cheers,
–sr