How to use specific gpu in accelerate?

I want to use GPUs with different conditions.
But I think the accelerator.device() is always cuda:0. even I wanted to rewrite it like cuda:1 or cuda:2 but it couldn’t be modified. How to fix this problem?

You can specify either CUDA_VISIBLE_DEVICES or use the --gpu_ids param to either your config or accelerate launch :slight_smile:

I’m a bit confused by your response. In fact, I already added the following code at the beginning of my file, specifying GPU number 6:

os.environ['CUDA_VISIBLE_DEVICES'] = str(6)
device = torch.device(f"cuda:{args.gpu}" if torch.cuda.is_available() else "cpu")

However, after using accelerator = Accelerator() , when I check accelerator.device , it still shows device(type='cuda') and remains on device 0. If I manually set accelerator.device = self.device , it throws an error: AttributeError: can't set attribute . Could you please provide more detailed instructions on how to specify the GPU using the Accelerator? I would greatly appreciate it.

1 Like

exactly the same situation!!

You cannot do this in your python file like that, this has to be done before your python file has been called, or before torch/accelerate/anything that init’s the GPU has been imported (possibly).

So solutions:

accelerate launch --gpu_ids 6 myscript.py
CUDA_VISIBLE_DEVICES=6 python myscript.py

(Do not know if this last solution will actually work, I haven’t tried it before)

import os
os.environ["CUDA_VISIBLE_DEVICES"] = str(6)
import torch
...
2 Likes

Thank you for your prompt reply. However, when following your guidance I encountered the following error:

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

Thank you I solved it

Hi, Could you please explain in more detail how you solved this problem? I’m interested to understand the steps you took to resolve it.

When changing your GPU through os.environ[‘CUDA_VISIBLE_DEVICES’], it is important that the env code should come first before all import sections. I think I failed when I tried to change GPU after all the other Accelerate and torch or tensorflow modules had been imported, suggesting that the problem of the ordinal environment codes order is the biggest.

2 Likes

Thank you I solved it

None of this worked for me. Everywhere I added CUDA_VISIBLE_DEVICES=4 but somehow it is taking 0. Nowhere in the code have I specified 0. This is the crappiest thing I’ve ever experienced. Wasted 1 hour and still not resolved. Kindly change the api to something simpler and respect the environment variables. Here is the code I’m using: GitHub - thuanz123/realfill