Trainer.predict in parallel not supported!

djaym7 · November 1, 2022, 5:22pm

Currently doing any inference via trainer.predict only uses 1 gpu to do all the computations. Has someone done any parallelization for this ? Split the data among all available gpus and do inference, aggregate all metrics once all processes are done ?

sgugger · November 2, 2022, 1:56pm

It is supported and you can have a look at our examples which all use distributed evaluation if you have several GPUs.

djaym7 · November 2, 2022, 4:49pm

No, it doesn’t run it parallel. Trainer.predict() keeps using 1 gpu and rest of 7 sit idle.

Different from that, I made my own which divides data and trainer.predicts on all N gpus and then gathers output back. This reduces time N times vs default usage of trainer.predict()

Topic		Replies	Views
Trainer API for Model Parallelism using AutoModelForQuestionAnswering 🤗Transformers	1	144	June 5, 2024
Data Parallelism for multi-GPUs Inference Intermediate	0	548	October 26, 2022
HF Trainer downstream evaluation on multiple GPUS 🤗Transformers	1	1074	December 21, 2022
Can't use multi GPU in evaluation from Trainer 🤗Transformers	3	930	December 6, 2023
Model and data parallelism when training on multiple GPUs? Amazon SageMaker	0	37	January 22, 2025

Trainer.predict in parallel not supported!

Related topics