Accelarator can't detect my GPUs?

Buggod · March 28, 2024, 7:32pm

Is my config incorrect? I belive the kernel warning is not relevant to this issue.
Would appreciate any help! I really want to use two GPUs in parrallel to have more CUDA memory!

muellerzr · March 28, 2024, 7:38pm

You’re in a notebook, you need the notebook launcher Launching Multi-GPU Training from a Jupyter Environment

Buggod · March 28, 2024, 7:48pm

Thank vm for the reply!
Howeveer, I’m just trying to run inference on a server. The link is about training on different IPs? How to work on this with a single line of code? I feel like it’s gonna be super easy.

muellerzr · March 28, 2024, 7:58pm

You’ll find running that code from a python file + accelerate launch or torchrun. will show what you want. This is a limitation of you working out of a Jupyter notebook session.

If you use regular big model inference it will do single parallel inference by loading one GPU after the next.

Here is an example of how to get distributed inference to work, which again requires the use of the notebook_launcher if using Jupyter.

Buggod · March 28, 2024, 8:25pm

Thank vm!

However, I still encounter issues. Can I know why ?

Buggod · March 28, 2024, 8:25pm

muellerzr · March 28, 2024, 8:26pm

That means you ran out of CPU memory most likely (RAM)

Buggod · March 28, 2024, 8:29pm

Wow , I requested 50 GB. I’m running 2 V100 16 GB, so what’s the ideal RAM in this case?

Buggod · March 28, 2024, 9:27pm

Sir I fixed the memory issue but can I know what went wrong now?

muellerzr · March 28, 2024, 10:10pm

Why are you using the accelerator as a context manager? What documentation are you going off of? That’s mentioned absolutely no where in any of our documentation. I’d highly recommend starting on our basic tutorials, such as the first one I linked there and this one Handling big models for inference

Buggod · March 29, 2024, 3:14am

Sir upon reading those docs, I have following questions:

My project requires me to inference a LLM (requires 24+ GB in full precision) on two 16g v100s, it seems that Doc Distributed Inference 2nd and 3rd part is related to my request using accelerate. While the 1st part in the Distributed Inference Doc and big model inference aren’t relevant, while I’ve being trying to deploy them the whole night.
Am I correct?
While the 3rd part of the Distributed Inference being helpful, the 2nd part of it is missing in the doc, which I find most straightforward and relevant. The name is " 2. Loading parts of a model onto each GPU and processing a single input at one time" Can you please provide that part of doc to me?

Thank you very much!

Topic		Replies	Views
Multi-GPU inference with accelerate Beginners	0	1735	October 19, 2023
Using another model when training a model with accelerate on multi-GPUs 🤗Accelerate	1	1205	October 31, 2022
Having trouble accelerate on my 2 GPU machine Beginners	0	741	May 24, 2023
Notebook_launcher set num_processes=2 but it say Launching training on one GPU. in Kaggle 🤗Accelerate	6	1949	December 10, 2022
Missing positional arguments when try to use multiple GPUs with accelerator 🤗Accelerate	4	2077	May 11, 2021

Accelarator can't detect my GPUs?

Related topics