Fastest way to do inference on a large dataset in huggingface?

Hello there!

Here is my issue. I have a model trained with huggingface (yay) that works well. Now I need to perform inference and compute predictions on a very large dataset of several millions of observations.

What is the best way to proceed here? Should I write the prediction loop myself? Which routines in datasets should be useful here? My computer has lots of RAM (100GB+), 20+ cpus and a big GPU card.


Hi ! If you have enough RAM I guess you can use any tool to load your data.

However if your dataset doesn’t fit in RAM I’d suggest to use the datasets, since it allows to load datasets without filling up your RAM and gives excellent performance.

Then I guess you can write your prediction loop yourself. Just make sure to pass batches of data to your model to make sure that your GPU is fully utilized. You can also use with batched=True and set batch_size to a reasonable value

1 Like

Hey @lhoestq,

I have a large dataset that I want to use for eval/other tasks that requires a trained model to do inference on it. (for context: i am using a translation model to translate multiple SFT, DPO datasets to multiple other language from english)

I’ve been using the .map() function from datasets with batched=True, and batch_size specified.

The problem is the inference model takes way too long to process even a couple of thousand datasets.

i have lots of vram and lots of GPU such that I can launch multiple instances of the same model on the same GPU and even have multiple GPUs.

Is there a way where I can use the map() function and do batched inference but utilising multiple instances of the model to gain more throughput more samples processed / second.

something like multithreading/multiprocessing where each thread accesses seperate instance of the model.

Hi ! Yes there is a code example in the docs of multi-GPU inference using map() with multiprocessing

Let me know how it goes !