How to do distributed Inference for large models with multiprocess?

NobelHu · March 21, 2024, 8:45am

I have a model with 20B and 4 A100 GPU with 40G gpu memory. I want to create two processes and each process own 2 gpus, then I can inference with fp16. So how can I do that with accelerate?

NobelHu · March 25, 2024, 6:52am

I have solved this problem by using DeepSpeed inference.

marcsun13 · April 15, 2024, 1:34pm

Hi @NobelHu, glad that you managed to make it work ! Would you mind sharing your solution for the community ? We are also thinking about linking an deepspeed inference in our docs so that everyone benefits from it.

Hannibal046 · May 26, 2024, 4:42am

Also want to know!

Topic		Replies	Views
Multi-gpu inference Beginners	2	815	May 14, 2024
Data Parallel Multi GPU Inference 🤗Accelerate	9	4661	September 15, 2023
Data Parallelism for multi-GPUs Inference Intermediate	0	546	October 26, 2022
Distributed Inference on GPT-2 Beginners	2	231	May 2, 2024
When I try to inference on multiple GPUs using multiple processes, the time for model. generate() becomes very long 🤗Transformers	0	474	June 12, 2023

How to do distributed Inference for large models with multiprocess?

Related topics