How to make model.generate() process using multiple CPU cores?

Transformers is tuned for GPUs and multi-GPUs, and is not suited to CPUs. Furthermore, Python itself is not suited to multi-threading or multi-processing.

However, there are various libraries for speeding things up, as there is a lot of demand for inferencing on CPUs. They are a little difficult to use, but I think it would be a good idea to try them out.

1 Like