I was wondering if it were possible to print out which GPU a model is being trained on when using accelerate as well as how many GPUs will be used for training?
You should look at the output of accelerate env
from the CLI. You can also configure this yourself by running accelerate config
before training.
Otherwise it’s highly dependent on how you start the script. E.g. if you call torchrun
itself and use accelerate in the script, it’ll use all of the GPUs available.
How are you calling your script?
@muellerzr Thank you for your response!
I call my script tmp.py
like accelerate launch tmp.py
Then it would be whatever accelerate env
has configured If that hasn’t been configured yet then most likely its using all of your GPUs? (Though I think
accelerate launch
will give you an error if you haven’t configured it yet)
I did run accelerate config
to set things up to use all 4 GPUs. Let me rephrase my question. I’d like to be able to log the total number of GPUs accelerate is using from within my python script tmp.py
for informational/debugging purposes. Is it possible to get that information from the Accelerate
object created in a python script?
Yes, for that you’d want the following information in Accelerator:
Accelerator.num_processes
and Accelerator.distributed_type
.
You can also gather these from the AcceleratorState class by doing:
state = AcceleratorState()
num_devices, device_kind = state.num_processes, state.distributed_type
(this does the same thing, you should probably just grab them from the accelerator
object you have made)
distributed_type
will return a DistributedType, which is an enum. You can do str(device_kind)
to get a string that looks like the following:
'DistributedType.MULTI_GPU'
That works like a charm!! Thank you!