Infrastructure: Vanilla CPU(any intel based CPU without any specialised ASIC/TPUs or anything)
How does transformers pipeline class handle batching for a system only with CPU, e.g.
pipeline(task='token-classification', model=model, tokenizer=tokenizer, batch_size = 128, device='cpu', )
If the above code(changing device to ‘cuda’) was working on a GPU then it would do a batch inferencing unless the batch size is too much and it encounters a OOM allocation error.
However, how would the same code for a CPU? How does the pipeline class natively handle it when the CPU isn’t capable of handling batch computing?