Does batching in the standard question-answering pipeline provide a speedup?

GPN · October 18, 2021, 3:53pm

Hi there.

I am using the QA pipeline and hoping to get speedup through batching. Basically what I want (this is pseudo code, not exact code; I would appreciate your guidance in doing it right):

       _classifier = pipeline("question-answering", model="deberta")
        result = _classifier(question=["What is the goal?","When did it happen?","Who did it?"], context="Once upon a time many years ago an engineer set up a question answering machine and it was hoped it would run really fast")

So the basic goal is to get this to work at roughly the same speed as a single question, assuming I give it a decent GPU capable of parallelizing the 3 questions.

I did search before asking this question I found a few others who have asked it in different ways but not yet answered:

Batched pipeline · Issue #6327 · huggingface/transformers · GitHub

[Benchmark] Pipeline for question answering · Issue #3007 · huggingface/transformers · GitHub

Many thanks for any pointers or help!

shaked571 · December 13, 2021, 12:44pm

In my experience it doesn’t help.
Also, when you look at the code, you can see that once it encode all the given examples, it iterate over them one by one (transformers.pipelines — transformers 4.0.0 documentation):

all_answers = []
for features, example in zip(features_list, examples):
    model_input_names = self.tokenizer.model_input_names + ["input_ids"]
    fw_args = {k: [feature.__dict__[k] for feature in features] for k in model_input_names}

    # Manage tensor allocation on correct device
    with self.device_placement():
        if self.framework == "tf":
            fw_args = {k: tf.constant(v) for (k, v) in fw_args.items()}
            start, end = self.model(fw_args)[:2]
            start, end = start.numpy(), end.numpy()
        else:
            with torch.no_grad():
                # Retrieve the score for the context tokens only (removing question tokens)
                fw_args = {k: torch.tensor(v, device=self.device) for (k, v) in fw_args.items()}
                start, end = self.model(**fw_args)[:2]
                start, end = start.cpu().numpy(), end.cpu().numpy()

Topic		Replies	Views
How to use the question-answering pipeline in batch mode? Beginners	0	405	July 12, 2022
Batched pipeline for Question-Answering Intermediate	0	557	April 28, 2022
Optimising performance non-standard systems 🤗Transformers	2	778	February 16, 2022
What's the best way to speed up inference on a large dataset? Beginners	3	3909	March 13, 2022
Batching on Vanilla CPU for Inference 🤗Transformers	0	314	July 17, 2023

Does batching in the standard question-answering pipeline provide a speedup?

Related topics