Tensor parallelism for customized model

zzxslp · September 2, 2024, 12:48am

Hi! If I want to do model inference with a customized large model (can’t fit into a single GPU), how do I enable tensor parallel to shard the model across multiple GPUs?

For my specific case, I have a multimodal-LLM with a ViT, projector, and an LLM. I am not sure what is the best library to start with. Should I use deepspeed, accelerate or something else?

Topic		Replies	Views
Model parallel with deepspeed integration Beginners	0	653	September 14, 2021
Model Parallelism, how to parallelize transformer? Beginners	3	12818	June 18, 2021
Tensor parallelism inference 🤗Transformers	0	73	July 18, 2024
Manual pipeline parallelization with DeepSpeed DeepSpeed	0	791	January 7, 2023
Multiple gpu not properly parallelized during model.generate() 🤗Transformers	4	1661	October 9, 2022

Tensor parallelism for customized model

Related topics