What's the best way to speed up inference on a large dataset?


I have been trying to do inference of a model I’ve finetuned for a large dataset.
I’ve done it this way: Summary of the tasks
Iterating over all the questions and contexts but it’s too slow.

This way from the course seems to be quite ok but I run into memory issues, assuming because the whole dataset is in a dict?

batch = {k: eval_set_for_model[k].to(device) for k in eval_set_for_model.column_names}
trained_model = AutoModelForQuestionAnswering.from_pretrained(trained_checkpoint).to(

with torch.no_grad():
    outputs = trained_model(**batch)

Is there some way I can pass the dataset like I would in lightning directly and iterate over the batches dynamically?
I.e. instead of getting the batch manually as above, do something like

for batch in iter(dataset):
 pred = model(**batch)


Thanks a lot in advance

You may find the discussion on pipeline batching useful. I think batching is usually only worth it for running on GPU. If you are doing inference on CPU looking into ONNX might make sense (probably it’s only worth the effort if you are going to be doing inference multiple times – if it’s a one-time thing you might just prefer to wait a bit longer!)

1 Like

thanks for the answer,
yes, I’ve tried pipeline batching, but I seem to not be able to feed the dataset into the pipeline (just for qa, it works for classification etc, for q&a I get asked to makea dict out of it).
It’s running on GPU.

I’m doing it for the kaggle student nlp project, so it’s just that I have the 9 hours inference limit. :sweat_smile:

Do you know how to do pipeline batching for q&a?