Accelerate.prepare hang on single machine multiple gpu

muellerzr · July 10, 2023, 12:04pm

I’ll need a bit more info about the machine. What GPUs? Also if you could use back-ticks, it would help a ton for readability of your code. I just tried mimicking the same on a machine with 4 GPUs (but only use 3):

{
  "compute_environment": "LOCAL_MACHINE",
  "distributed_type": "MULTI_GPU",
  "downcast_bf16": false,
  "machine_rank": 0,
  "main_training_function": "main",
  "mixed_precision": "no",
  "num_machines": 1,
  "num_processes": 3,
  "rdzv_backend": "static",
  "same_network": false,
  "tpu_use_cluster": false,
  "tpu_use_sudo": false,
  "use_cpu": false
}

Script:

import torch.nn as nn
from accelerate import Accelerator

if __name__ == "__main__":
    accelerator = Accelerator()
    model = nn.Conv2d(10, 20, 3, 1, 1)
    print("prepare")
    model = accelerator.prepare(model)
    print("done")

What is your version of Accelerate and PyTorch as well? Thanks!

Topic		Replies	Views
`Accelerator.prepare` utilize only one GPU instead of all the 8 available GPUs and raises "CUDA out of memory" 🤗Accelerate	3	2914	July 19, 2024
Accelerate on 1 GPU 🤗Accelerate	2	1918	April 8, 2022
What does "--multi_gpu" do under the hood? (and how to use it) 🤗Accelerate	7	6911	May 31, 2023
Multi-GPU Training sometimes working with 2GPU, but never more than 2 🤗Accelerate	5	3113	August 8, 2024
No GPUs found in distributed mode 🤗Accelerate	0	958	March 1, 2023

Accelerate.prepare hang on single machine multiple gpu

Related topics