How to find the number of GPUs being used for training?

I am using Hugging face Accelerator and have rented GPUs from AWS. I am using another PaaS called ‘dstack.ai’ to access the GPUs. I have several GPUs at my disposal, but while training I select to run only 2 or 4 of them. I want to ensure that GPUs are actually being used while training. Is there any way to know how many GPUs are being used by Accelerator while training ? Or any command that I can add to my PyTorch training script to print the number of GPUs being used for training ?

  • Check how many devices are available to torch (not necessarily how many are used): torch.cuda.device_count()
  • Check how many GPUs the HF Trainer is using (if you use that): trainer.args._n_gpu (only when using DataParallel, will be 1 for DistributedDataParallel)
  • I don’t have much experience with Accelerator, but I think you can get the world size with accelerator.state.num_processes
2 Likes