Main code executed twice per process. Normal behaviour?

pzurrer · November 16, 2021, 10:01am

Hello everyone. I am just getting started with accelerate and distributed training in general. To test the number of GPUs used I created a simple script containing a simple main function:

def main():
    
    deepspeed_plugin = DeepSpeedPlugin(zero_stage=3, gradient_accumulation_steps=1)
    accelerator = Accelerator(fp16=True, deepspeed_plugin=deepspeed_plugin)

    print(f'Num Processes: {accelerator.num_processes}; Device: {accelerator.device}; Process Index: {accelerator.process_index}')

I launch this with CUDA_VISIBLE_DEVICES=0,1 accelerate launch --config_file accelerate_config.yaml accelerate_test.py. I get the following output:

Num Processes: 2; Device: cuda:0; Process Index: 0
Num Processes: 2; Device: cuda:0; Process Index: 0
Num Processes: 2; Device: cuda:1; Process Index: 1
Num Processes: 2; Device: cuda:1; Process Index: 1

It seems like main is executed twice for each process. My question is whether this is expected behaviour?

accelerate_config.yaml contains:

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  offload_optimizer_device: cpu
  zero_stage: 3
distributed_type: DEEPSPEED
fp16: false
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 2

Thank you.

sgugger · November 16, 2021, 12:38pm

This is very weird, it’s not supposed to happen indeed, you should only see two print statements.

pzurrer · November 16, 2021, 10:23pm

Hmm yeah. Not really sure how to debug what is happening under the hood. I use Pytorch 1.8.0 and I am running the newest versions of both accelerate and deepspeed. Is this something you can reproduce on your end as well or could it be related to my setup?

pzurrer · November 17, 2021, 10:01am

Ah I rechecked and figured it out. I made a stupid formatting error and main was accidentally called twice in my script. Everything works as expected. Sorry for the inconveniences.

Topic		Replies	Views
Multi-node training 🤗Accelerate	2	3135	January 16, 2023
Detecting single gpu within each node 🤗Accelerate	2	766	January 17, 2023
Accelerate on 1 GPU 🤗Accelerate	2	1912	April 8, 2022
Accelerate.prepare hang on single machine multiple gpu 🤗Accelerate	3	1317	July 16, 2023
More processes than GPUs with DeepSpeed launcher DeepSpeed	0	237	January 25, 2024

Main code executed twice per process. Normal behaviour?

Related topics