Data Parallelism for multi-GPUs Inference

Hi everyone,

I need to do inference on huge size of data and I would like to send the pre-trained HF model to multiple GPUs. Therefore my problem is a data parallel rather than a model parallel. I have seen that DP cannot support model.generate() method. Please, do you have any suggestion about Inference on multi-GPU?

Thanks a lot!