More processes than GPUs with DeepSpeed launcher

Lucius42 · January 25, 2024, 8:07pm

Hello,

I am training LoRA adaptation of a T5 model in a one-machine multiple GPU setup.
I am using Transformers 4.26.1 and DeepSpeed 0.9.2 and launching my script with deepspeed (thus the parallelization setup is Distributed Data Parallel).

Just before training, I first get four processes in my first GPU (probably the four models loaded in right?)

When the training starts, I have four processes running in my first GPU and then one process for each other GPU. Although I am using regular PyTorch I have the same issue as in this repo : extra process when running ddp across multiple GPUs · Lightning-AI/pytorch-lightning · Discussion #9864 · GitHub.

See image

Is it normal? I have only seen examples with one process per GPU. I would be greatly interested in an explanation.

Thanks !

Topic		Replies	Views
Saving unique weights while training on multiple GPU - Trainer 🤗Transformers	0	255	January 25, 2024
How to Create one Process But Using Multi GPU? DeepSpeed	0	716	May 15, 2023
Multi GPU training - Model parallelism DeepSpeed	1	1883	February 2, 2024
Model parallel with deepspeed integration Beginners	0	639	September 14, 2021
Setup for Deepspeed Multi GPU Training DeepSpeed	2	7913	December 7, 2022

More processes than GPUs with DeepSpeed launcher

Related topics