Hi there.
I am using the QA pipeline and hoping to get speedup through batching. Basically what I want (this is pseudo code, not exact code; I would appreciate your guidance in doing it right):
_classifier = pipeline("question-answering", model="deberta")
result = _classifier(question=["What is the goal?","When did it happen?","Who did it?"], context="Once upon a time many years ago an engineer set up a question answering machine and it was hoped it would run really fast")
So the basic goal is to get this to work at roughly the same speed as a single question, assuming I give it a decent GPU capable of parallelizing the 3 questions.
I did search before asking this question I found a few others who have asked it in different ways but not yet answered:
Batched pipeline · Issue #6327 · huggingface/transformers · GitHub
[Benchmark] Pipeline for question answering · Issue #3007 · huggingface/transformers · GitHub
Many thanks for any pointers or help!