I was wondering if it were possible to print out which GPU a model is being trained on when using accelerate as well as how many GPUs will be used for training?
You should look at the output of
accelerate env from the CLI. You can also configure this yourself by running
accelerate config before training.
Otherwise it’s highly dependent on how you start the script. E.g. if you call
torchrun itself and use accelerate in the script, it’ll use all of the GPUs available.
How are you calling your script?
@muellerzr Thank you for your response!
I call my script
accelerate launch tmp.py
Then it would be whatever
accelerate env has configured If that hasn’t been configured yet then most likely its using all of your GPUs? (Though I think
accelerate launch will give you an error if you haven’t configured it yet)
I did run
accelerate config to set things up to use all 4 GPUs. Let me rephrase my question. I’d like to be able to log the total number of GPUs accelerate is using from within my python script
tmp.py for informational/debugging purposes. Is it possible to get that information from the
Accelerate object created in a python script?
Yes, for that you’d want the following information in Accelerator:
You can also gather these from the AcceleratorState class by doing:
state = AcceleratorState() num_devices, device_kind = state.num_processes, state.distributed_type
(this does the same thing, you should probably just grab them from the
accelerator object you have made)
distributed_type will return a DistributedType, which is an enum. You can do
str(device_kind) to get a string that looks like the following:
That works like a charm!! Thank you!