Hello everyone. I am just getting started with accelerate and distributed training in general. To test the number of GPUs used I created a simple script containing a simple main function:
def main():
deepspeed_plugin = DeepSpeedPlugin(zero_stage=3, gradient_accumulation_steps=1)
accelerator = Accelerator(fp16=True, deepspeed_plugin=deepspeed_plugin)
print(f'Num Processes: {accelerator.num_processes}; Device: {accelerator.device}; Process Index: {accelerator.process_index}')
I launch this with CUDA_VISIBLE_DEVICES=0,1 accelerate launch --config_file accelerate_config.yaml accelerate_test.py
. I get the following output:
Num Processes: 2; Device: cuda:0; Process Index: 0
Num Processes: 2; Device: cuda:0; Process Index: 0
Num Processes: 2; Device: cuda:1; Process Index: 1
Num Processes: 2; Device: cuda:1; Process Index: 1
It seems like main
is executed twice for each process. My question is whether this is expected behaviour?
accelerate_config.yaml
contains:
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
offload_optimizer_device: cpu
zero_stage: 3
distributed_type: DEEPSPEED
fp16: false
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 2
Thank you.