Recommended Hardware for NER Pipeline Model

Trying to get started and run the pretrained NER pipeline model (https://huggingface.co/transformers/task_summary.html#named-entity-recognition) on about 10 million instances of text. Would you recommend using a CPU or GPU?

Previously, I used spaCy and broke my data up into batch sizes of 1k then ran those batches on a C5 instance on AWS. The AWS EC2 instance was compute optimized and had 96 cores. I’m able to run any instance on AWS. Thanks!

if you have 10M examples and have access to GPU then definitely use GPU. If you want fast inference on CPU for ner pipeline then you could try onnx_transformers, which provides same API as pipeline but leverages onnx for accelerated inference.