Currently doing any inference via trainer.predict only uses 1 gpu to do all the computations. Has someone done any parallelization for this ? Split the data among all available gpus and do inference, aggregate all metrics once all processes are done ?
1 Like
It is supported and you can have a look at our examples which all use distributed evaluation if you have several GPUs.
No, it doesn’t run it parallel. Trainer.predict() keeps using 1 gpu and rest of 7 sit idle.
Different from that, I made my own which divides data and trainer.predicts on all N gpus and then gathers output back. This reduces time N times vs default usage of trainer.predict()