How to run 30B meta model on two nodes with accelerate?


I can successfully run the 30B meta model on one node (following load_checkpoint_and_dispatch "Expected all tensors to be on the same device" for > 1 GPU devices 路 Issue #362 路 huggingface/accelerate 路 GitHub). Now I was curious if I can run the same on two nodes to prepare for even larger models. I ran 鈥渁ccelerate config鈥 and 鈥渁ccelerate launch my_script.py鈥 on both nodes, but it seems that the model is just completely loaded on each of the two nodes.

There is nothing to dispatch the model between nodes right now (and it鈥檚 probably too complicated to be added soon).

ok, that鈥檚 what I feared. Thank you very much.

So if I understand correctly, for launching very large models such as BLOOM 176B for inference, I can do this via Accelerate as long as one node has enough GPUs and GPU memory. When I need to distribute to two nodes via some kind of model parallelism, I should better write a customized solution. I would appreciate whatever answer such that I can move on and try something different.

We鈥檙e actively working on this actually, you can follow this pr: Add balanced option for auto device map creation by sgugger 路 Pull Request #534 路 huggingface/accelerate 路 GitHub

Thanks for your response! From the pull request, it seems like you are improving the memory balancing over the GPUs within a single node. This is great already! I am wondering whether this improved device map would also extend to multiple nodes? For example I have 2 nodes with 8 A6000 48GB each. I want to have the first half of layers assigned to the first node (= 8 A6000) and the second half of layers to the second node (another 8 A6000).

@donut32 If you want to run inference on multiple nodes, you may find this project useful. It can use pipeline parallelism to run inference on multiple nodes. It also provides a huggingface-compatible API
Detailed instructions: Serving OPT-175B using Alpa 鈥 Alpa 0.1.5.dev12 documentation
A live demo: