Setup for Deepspeed Multi GPU Training

According to Trainer — transformers 4.4.2 documentation (see "Deployment in Notebooks) the following code in a Notebook shall work with multiple GPUs:

DeepSpeed requires a distributed environment even when only one process is used.

This emulates a launcher in the notebook

import os
os.environ[‘MASTER_ADDR’] = ‘localhost’
os.environ[‘MASTER_PORT’] = ‘9994’ # modify if RuntimeError: Address already in use
os.environ[‘RANK’] = “0”
os.environ[‘LOCAL_RANK’] = “0”
os.environ[‘WORLD_SIZE’] = “1”

Now proceed as normal, plus pass the deepspeed config file

training_args = TrainingArguments(…, deepspeed=“ds_config.json”)
trainer = Trainer(…)

However, I am struggling to get this running with 2 GPUs. There seems to be no way to manually tell deepspeed to use 2 GPUs. The documentation says deepseed should detect them automatically but it does not on my system. It only runs on 1 GPU. Depending on the Rank setting it runs either on GPU 0 or 1 but never on both.
(I need to run this on 2 GPUs because I don’t have an RTX3090 with enough memory)

Is there a way to manually tell deepspeed to use 2 GPUs in a Jupyter Notebook like the above example?