Hi everyone,
I need to do inference on huge size of data and I would like to send the pre-trained HF model to multiple GPUs. Therefore my problem is a data parallel rather than a model parallel. I have seen that DP cannot support model.generate() method. Please, do you have any suggestion about Inference on multi-GPU?
Thanks a lot!