Understand running a script with `python` and `accelerate`

goldenalcheese · February 6, 2025, 8:07pm

Hi,

As a beginner, I have a quick question on the behavior of running a script with the command python v.s. accelerate. For instance ,take this note book as an example: Google Colab. Say I convert it into a python script called qwen.py, and attempt to run it one a single node with 8 GPUs. I found the following commands:

python qwen.py
accelerate launch --multi-gpu --num-process 8 qwen.py

are different, but related. Specifically, with the python qwen.py command, I still see all 8 GPUs are used when inspecting nvidia-sim. Also, I compare it with CUDA_VISIBLE_DEVICES=0 python qwen.py, the run-time has been reduced by exactly 8x. When running python qwen.py, I notice trainer.accelerator.num_process=1 and trainer.accelerator.state has: ‘Distributed environment: NO’.

On the other hand, with the accelerate launch command, I have trainer.accelerator.num_process=8, trainer.accelerator.state has: ‘Distributed environment: MULTI_GPU backend NCCL’. And all GPUs are occupied when inspecting with nvidia-smi. However, it seems the occupancy pattern (e.g., memory) is different from calling with python.

I just want to understand better what has been done under the hood, and what is the difference running with python and accelerate launch. Moreover, it seems to me python qwen.py itself should just run on cuda:0, why it runs on all GPUs?

Topic		Replies	Views
`num_processes == 1` even when I set it to `--num_processes 2` 🤗Accelerate	5	3283	May 18, 2023
Executing the accelerate script within a child process 🤗Accelerate	0	215	October 18, 2023
Accelerate on 1 GPU 🤗Accelerate	2	1874	April 8, 2022
Accelerate doesn't seem to use my GPU? 🤗Accelerate	7	5701	September 18, 2024
Single GPU is faster than multiple GPUs 🤗Accelerate	3	1921	January 31, 2024

Understand running a script with `python` and `accelerate`

Related topics