Batching on Vanilla CPU for Inference

Shamik · July 17, 2023, 7:11am

Hello,

Infrastructure: Vanilla CPU(any intel based CPU without any specialised ASIC/TPUs or anything)

How does transformers pipeline class handle batching for a system only with CPU, e.g.

 pipeline(task='token-classification',
                     model=model,
                     tokenizer=tokenizer,
                     batch_size = 128,
                     device='cpu',
                     )

If the above code(changing device to ‘cuda’) was working on a GPU then it would do a batch inferencing unless the batch size is too much and it encounters a OOM allocation error.

However, how would the same code for a CPU? How does the pipeline class natively handle it when the CPU isn’t capable of handling batch computing?

Thank you

Topic		Replies	Views
Asynchronous CPU-GPU computation Beginners	0	346	March 15, 2024
Does batching in the standard question-answering pipeline provide a speedup? Intermediate	1	1471	December 13, 2021
Is there any way to avoid CPU bottlenecks when doing single prompt inference? Intermediate	1	972	June 12, 2023
Batched pipeline inference has little speed improvement on longer texts Beginners	1	1888	October 27, 2023
How to make `pipeline` automatically scale? 🤗Transformers	3	575	July 28, 2021

Batching on Vanilla CPU for Inference

Related topics