How to run 30B meta model on two nodes with accelerate?

thies · May 17, 2022, 2:17pm

Hello,

I can successfully run the 30B meta model on one node (following load_checkpoint_and_dispatch "Expected all tensors to be on the same device" for > 1 GPU devices · Issue #362 · huggingface/accelerate · GitHub). Now I was curious if I can run the same on two nodes to prepare for even larger models. I ran “accelerate config” and “accelerate launch my_script.py” on both nodes, but it seems that the model is just completely loaded on each of the two nodes.

sgugger · May 17, 2022, 2:51pm

There is nothing to dispatch the model between nodes right now (and it’s probably too complicated to be added soon).

thies · May 17, 2022, 2:57pm

ok, that’s what I feared. Thank you very much.

donut32 · July 20, 2022, 11:16am

So if I understand correctly, for launching very large models such as BLOOM 176B for inference, I can do this via Accelerate as long as one node has enough GPUs and GPU memory. When I need to distribute to two nodes via some kind of model parallelism, I should better write a customized solution. I would appreciate whatever answer such that I can move on and try something different.

muellerzr · July 20, 2022, 11:43am

We’re actively working on this actually, you can follow this pr: Add balanced option for auto device map creation by sgugger · Pull Request #534 · huggingface/accelerate · GitHub

donut32 · July 20, 2022, 2:10pm

Thanks for your response! From the pull request, it seems like you are improving the memory balancing over the GPUs within a single node. This is great already! I am wondering whether this improved device map would also extend to multiple nodes? For example I have 2 nodes with 8 A6000 48GB each. I want to have the first half of layers assigned to the first node (= 8 A6000) and the second half of layers to the second node (another 8 A6000).

lmzheng · August 16, 2022, 4:20pm

@donut32 If you want to run inference on multiple nodes, you may find this project useful. It can use pipeline parallelism to run inference on multiple nodes. It also provides a huggingface-compatible API
Detailed instructions: Serving OPT-175B using Alpa — Alpa 0.1.5.dev12 documentation
A live demo: https://opt.alpa.ai/

Topic		Replies	Views
Issues loading NLLB 54B MoE model for multi-GPU inferencing using accelerate 🤗Transformers	0	899	April 22, 2023
Inference on Multi-GPU/multinode Beginners	4	7487	January 12, 2023
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9605	October 16, 2024
Why can't the bloom model be run (really slowly) on consumer hardware? Models	2	558	July 26, 2022
Multi node CPU to train transformer GPT-JT-6B-v1 (moved) 🤗Transformers	0	422	February 20, 2023

How to run 30B meta model on two nodes with accelerate?

Related topics