Distributed inference for datasets created on the fly

deathcrush · April 2, 2023, 10:45am

Hi @lhoestq, thanks so much for engaging. I agree this is unusual, it’s new research with LLMs I am carrying out for my PhD. What is happening here is that I get the LLM to do error correction on some output it generated before and there is a deterministic program that checks whether the output should be corrected or not. This program works on strings, so I need to detetokenize and use some outside information to determine whether I should query the model again for a given example or not. Since compute_metrics runs at the end of the generation step, converts everything to strings and I can pass to it the metadata it needs to correct the errors, I thought I could also use it to build dataset_dict. I wasn’t sure how to make sure that not all workers will end up pre-processing the resulting dataset, thanks so much for your input.

Topic		Replies	Views
Inference with Multi-Step Reasoning Intermediate	0	649	March 23, 2023
Distributed inference: how to store results in a global variable 🤗Accelerate	2	38	October 16, 2024
Distributed Inference with 🤗 Accelerate - Compare Baseline vs Fine Tuned Model 🤗Accelerate	3	540	January 30, 2024
HF Trainer downstream evaluation on multiple GPUS 🤗Transformers	1	1090	December 21, 2022
Fastest way to do inference on a large dataset in huggingface? 🤗Datasets	5	3367	May 3, 2024

Distributed inference for datasets created on the fly

Related topics