Notebook_launcher set num_processes=2 but it say Launching training on one GPU. in Kaggle

I am trying to test this article code with A100 x 2 GPUs. Link - Launching Multi-Node Training from a Jupyter Environment

But it always gets only one GPU in Kaggle Notebook. How to solve this issue?

Print - Launching training on one GPU. but it has 2 GPU.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   43C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I think this point of code run - accelerate/launchers.py at v0.15.0 · huggingface/accelerate · GitHub

What’s your version of Accelerate? Only the latest version (0.15.0) will launch in Kaggle successfully. What you pointed out there was the Google Colab check statement :slight_smile:

thank you very much. Kaggle has already installed that library and my installation does not work. Now I update it to 0.15.0 and it is working now.

1 Like

Results:

Colab

epoch 0: 87.67
epoch 1: 89.31
epoch 2: 93.93
epoch 3: 96.97
epoch 4: 97.55
Total execution time = 354.609 sec

Kaggle

epoch 0: 91.68
epoch 1: 89.31
epoch 2: 94.97
epoch 3: 97.46
epoch 4: 97.51
Total execution time = 427.316 sec

1 GPU Colab is faster than 2 GPU.

You should make sure you’re actually setting up your benchmarks right by reading our docs on it, as it’s very easy to just think that running the same script does the same thing (spoiler, it does not! :slight_smile: ) Comparing performance between different device setups

I am sorry to tell you that, it is not getting fast execution on 2 GPUs. I don’t know why. :smiling_face_with_tear: :smiling_face_with_tear: :exploding_head: :exploding_head: :sob:

This is what I use as a sample code - Launching Multi-Node Training from a Jupyter Environment

These are what I changed according to Comparing performance between different device setups

Setting Seed

set_seed(42) - Both are the same value in 1 GPU and 2 GPU.

Batch Sizes

In Colab - 128

def get_dataloaders(batch_size: int = 128):
-----
def training_loop(mixed_precision="fp16", seed: int = 42, batch_size: int = 128):
-----
args = ("fp16", 42, 128)
notebook_launcher(training_loop, args, num_processes=1)

In Kaggle - 64

def get_dataloaders(batch_size: int = 64):
--
def training_loop(mixed_precision="fp16", seed: int = 42, batch_size: int = 64):
---
args = ("fp16", 42, 64)
notebook_launcher(training_loop, args, num_processes=2)

Learning Rates

Both use same code

# Intantiate the optimizer
learning_rate = 3e-2 / 25
learning_rate *= accelerator.num_processes
optimizer = torch.optim.Adam(params=model.parameters(), lr=learning_rate)

Results:

In Kaggle (2 GPU):

Launching training on 2 GPUs.
epoch 0: 89.86
epoch 1: 87.50
epoch 2: 94.30
epoch 3: 96.78
epoch 4: 97.61
Total execution time = 460.321 sec

In Colab (1 GPU):

Launching training on one GPU.
epoch 0: 87.96
epoch 1: 87.36
epoch 2: 94.15
epoch 3: 97.16
epoch 4: 97.55
Total execution time = 341.572 sec

This is the code I use to calculate the time:

start_time = time.time()
args = ("fp16", 42, xx)
notebook_launcher(training_loop, args, num_processes=xx)
end_time = time.time()
print("Total execution time = {:.3f} sec".format(end_time - start_time))

xx - it changes depending on the system.

@sgugger @patrickvonplaten any suggestions?