Model Parallelism and Pipelining for Model Training

Qikai · April 4, 2024, 3:01am

Hi,

I am currently experimenting with training on some small LLaMA models (i.e. LLaMA-MoE-3.5B (2/8)). Due to the model size, it cannot fit on a single RTX 3090 Ti GPU with 24GB of VRAM. Note also that due to some requirements, I cannot use adapters to finetune the model. The entire model needs to be trainable.

While I can access multiple GPUs to train the model, the model parameters/components need to be split in a model-parallel or pipelined scheme. However, I have not found any tutorial, repositories, or libraries on HuggingFace that supports the splitting of a single model over multiple GPUs. (There is this page “Model Parallelism”, but it only talks about the higher level concepts on how to split the model, but not the actual implementation.)

Does anyone have any suggestions or on how model pipelining or parallelism, where a model is split between several GPUs, could be conducted?

Thanks in advance.

swtb · April 4, 2024, 9:44am

For MultiGPU there are frameworks like accelerate, DeepSpeed, Torchrun that can do this:
Efficient Training on Multiple GPUs (huggingface.co)

Alternatively, you should consider single GPU optimisations:
Methods and tools for efficient training on a single GPU (huggingface.co)

With optimisation I find it is possible to work with models such as Mistral 7B on a 16GB card (colab environment).

Qikai · April 10, 2024, 8:14pm

Unfortunately, due to certain requirements (model size, available GPU, intended use, etc.), we can only conduct multi-GPU training through Model Parallelism or Pipelining.

To further explain the intended scenario, assume that I have a tweaked model with 4 layers:

Input → Layer1 → Layer2 → Layer3 → Layer4 → Output.

Now if I have 4 GPUs, is there some way to manually assign each layer to a particular GPU? For example, GPU1 should only load and compute Layer1, and then pass the output to GPU2 for computation by Layer2.

swtb · April 11, 2024, 10:50am

Efficient Training on Multiple GPUs (huggingface.co)

Pipeline Parallelism looks like it fits your problem

Topic		Replies	Views
Parallelizing huggingface models DeepSpeed	0	350	July 24, 2023
Multiple gpu training 🤗Transformers	1	2375	August 10, 2024
Accelerating inference for local HuggingFacePipeline of Llama3 🤗Transformers	0	89	August 1, 2024
Trainer API for Model Parallelism using AutoModelForQuestionAnswering 🤗Transformers	1	145	June 5, 2024
Multi-gpu huggingface training using trl 🤗Transformers	0	439	October 22, 2024

Model Parallelism and Pipelining for Model Training

Related topics