Hi! If I want to do model inference with a customized large model (can’t fit into a single GPU), how do I enable tensor parallel to shard the model across multiple GPUs?
For my specific case, I have a multimodal-LLM with a ViT, projector, and an LLM. I am not sure what is the best library to start with. Should I use deepspeed, accelerate or something else?