Make compute_metric / load_metric distributed and use multiple test sets on do_predict

sadra · February 12, 2022, 1:49am

I am using transformer PyTorch examples to do Machine Translation. I am computing multiple metrics on large validation sets.

how is it possible to Integrate distributed compute_metrics with with seq2seqTrainer? (with or without predict_with_generate( just for during do_eval) ) I took a look at distributed usage of load_metric, however not sure if it is possible to integrate it without implementing custom train/evaluation loop.
how is it possible to have two sets on do_predict?

Thanks,

sgugger · February 14, 2022, 4:08pm

As long as you launch your script with the torch.distributed launcher, the evaluation will automatically be distributed (each GPU will see a fraction of your dataset and make the predictions on it).

To pass along two datasets for predictions, you will need to tweak the example script to run the predict method on those two datasets.

anmolagarwal999 · August 25, 2023, 1:11pm

@sgugger I am using DeepSpeed ZeroStage 3 and am passing a custom compute_metrics() to the trainer. I have 4 GPUs (devices). The compute_metrics function is being invoked by all the devices. Moreover, all the points (let’s say there N datapoints in the eval set) in the entire eval dataset seem to be sent to the compute_metrics of all the devices. Am I missing something here?

Topic		Replies	Views
Compute_metrics() behaves strangely in distributed setting 🤗Transformers	0	49	July 28, 2024
HF Trainer downstream evaluation on multiple GPUS 🤗Transformers	1	1073	December 21, 2022
Trainer API for Model Parallelism on Multiple GPUs 🤗Transformers	5	4147	September 10, 2024
Combine multiple metrics in compute_metrics for validation Beginners	1	899	June 4, 2024
Huggingface transformer sequence classification 🤗Transformers	3	490	March 26, 2022

Make compute_metric / load_metric distributed and use multiple test sets on do_predict

Related topics