Hello,
Infrastructure: Vanilla CPU(any intel based CPU without any specialised ASIC/TPUs or anything)
How does transformers pipeline class handle batching for a system only with CPU, e.g.
pipeline(task='token-classification',
model=model,
tokenizer=tokenizer,
batch_size = 128,
device='cpu',
)
If the above code(changing device to ‘cuda’) was working on a GPU then it would do a batch inferencing unless the batch size is too much and it encounters a OOM allocation error.
However, how would the same code for a CPU? How does the pipeline class natively handle it when the CPU isn’t capable of handling batch computing?
Thank you