Getting GPU info from Accelerate

I was wondering if it were possible to print out which GPU a model is being trained on when using accelerate as well as how many GPUs will be used for training?

You should look at the output of accelerate env from the CLI. You can also configure this yourself by running accelerate config before training.

Otherwise it’s highly dependent on how you start the script. E.g. if you call torchrun itself and use accelerate in the script, it’ll use all of the GPUs available.

How are you calling your script? :slight_smile:

@muellerzr Thank you for your response!

I call my script tmp.py like accelerate launch tmp.py

Then it would be whatever accelerate env has configured :slight_smile: If that hasn’t been configured yet then most likely its using all of your GPUs? (Though I think accelerate launch will give you an error if you haven’t configured it yet)

I did run accelerate config to set things up to use all 4 GPUs. Let me rephrase my question. I’d like to be able to log the total number of GPUs accelerate is using from within my python script tmp.py for informational/debugging purposes. Is it possible to get that information from the Accelerate object created in a python script?

1 Like

Yes, for that you’d want the following information in Accelerator:

Accelerator.num_processes and Accelerator.distributed_type.

You can also gather these from the AcceleratorState class by doing:

state = AcceleratorState()
num_devices, device_kind = state.num_processes, state.distributed_type

(this does the same thing, you should probably just grab them from the accelerator object you have made)

distributed_type will return a DistributedType, which is an enum. You can do str(device_kind) to get a string that looks like the following:

'DistributedType.MULTI_GPU'

That works like a charm!! Thank you!

1 Like