I have a model with 20B and 4 A100 GPU with 40G gpu memory. I want to create two processes and each process own 2 gpus, then I can inference with fp16. So how can I do that with accelerate?
I have solved this problem by using DeepSpeed inference.
2 Likes
Hi @NobelHu, glad that you managed to make it work ! Would you mind sharing your solution for the community ? We are also thinking about linking an deepspeed inference in our docs so that everyone benefits from it.
Also want to know!