How to find the number of GPUs being used for training?

rishikesh · April 29, 2022, 4:06am

I am using Hugging face Accelerator and have rented GPUs from AWS. I am using another PaaS called ‘dstack.ai’ to access the GPUs. I have several GPUs at my disposal, but while training I select to run only 2 or 4 of them. I want to ensure that GPUs are actually being used while training. Is there any way to know how many GPUs are being used by Accelerator while training ? Or any command that I can add to my PyTorch training script to print the number of GPUs being used for training ?

BramVanroy · April 29, 2022, 7:47am

Check how many devices are available to torch (not necessarily how many are used): torch.cuda.device_count()
Check how many GPUs the HF Trainer is using (if you use that): trainer.args._n_gpu (only when using DataParallel, will be 1 for DistributedDataParallel)
I don’t have much experience with Accelerator, but I think you can get the world size with accelerator.state.num_processes

Topic		Replies	Views
Why is Trainer only using 1 (not 4) GPUs? Beginners	1	1589	June 2, 2022
Getting GPU info from Accelerate 🤗Accelerate	6	2143	July 6, 2022
Accelerate sees only one GPU on multi-GPU Sagemaker instance 🤗Accelerate	1	1524	May 2, 2023
Limit GPU cores for training 🤗Transformers	4	1541	September 14, 2023
Batch sizes / 2 GPUs + Windows 10 = 1 GPU? Beginners	6	3100	August 22, 2021

How to find the number of GPUs being used for training?

Related topics