Accelarator can't detect my GPUs?

Is my config incorrect? I belive the kernel warning is not relevant to this issue.
Would appreciate any help! I really want to use two GPUs in parrallel to have more CUDA memory!

You’re in a notebook, you need the notebook launcher :slight_smile: Launching Multi-GPU Training from a Jupyter Environment

Thank vm for the reply!
Howeveer, I’m just trying to run inference on a server. The link is about training on different IPs? How to work on this with a single line of code? I feel like it’s gonna be super easy.

You’ll find running that code from a python file + accelerate launch or torchrun. will show what you want. This is a limitation of you working out of a Jupyter notebook session.

If you use regular big model inference it will do single parallel inference by loading one GPU after the next.

Here is an example of how to get distributed inference to work, which again requires the use of the notebook_launcher if using Jupyter.

Thank vm!

However, I still encounter issues. Can I know why ?

That means you ran out of CPU memory most likely (RAM)

Wow , I requested 50 GB. I’m running 2 V100 16 GB, so what’s the ideal RAM in this case?

Sir I fixed the memory issue but can I know what went wrong now?

Why are you using the accelerator as a context manager? What documentation are you going off of? That’s mentioned absolutely no where in any of our documentation. I’d highly recommend starting on our basic tutorials, such as the first one I linked there and this one Handling big models for inference

Sir upon reading those docs, I have following questions:

  1. My project requires me to inference a LLM (requires 24+ GB in full precision) on two 16g v100s, it seems that Doc Distributed Inference 2nd and 3rd part is related to my request using accelerate. While the 1st part in the Distributed Inference Doc and big model inference aren’t relevant, while I’ve being trying to deploy them the whole night.
    Am I correct?
  2. While the 3rd part of the Distributed Inference being helpful, the 2nd part of it is missing in the doc, which I find most straightforward and relevant. The name is " 2. Loading parts of a model onto each GPU and processing a single input at one time" Can you please provide that part of doc to me?

Thank you very much!