Model Parallism

pkr7098 · April 21, 2024, 1:02pm

Hi, I want pre-train LLaMA-7B on my multi-gpu system. However, because of the huge model, I face OOM. I already read several method in this forum and HuggingFace documents, but they use from_pretrained which cannot be used in my experiment since I have to training from scratch.

What I want is to split model (not DDP or DP) and allocate each layers in different GPU, i.e., Model Parallism. Is there any way or solution to resolve this problem? Thank you.

Topic		Replies	Views
Manual pipeline parallelization with DeepSpeed DeepSpeed	0	784	January 7, 2023
Model parallel with deepspeed integration Beginners	0	650	September 14, 2021
Parallelizing huggingface models DeepSpeed	0	355	July 24, 2023
Fine-tuning Llama-7B Models	2	10644	May 2, 2023
Llama2-70b-chat loading Cuda Out of Memory Models	0	1224	July 26, 2023

Model Parallism

Related topics