Number of Inter and Intra-ops threads used by BERT models

Hakudoshi · August 15, 2022, 10:26pm

Hi everyone,
I’m trying to use Roberta model for inference on CPU in production environment. The model is trained in python and then exported to a TorchScript model for inference in Java (using the libtorch library). During Inference I see that multiple threads and cores are being utilized. when running on a machine with 40 cores and running htop I noticed that 21 threads are being utilized and the CPU utilization is at around 1700%. Also running on a 16 core machine I get around 450% CPU utilization. According to this doc: CPU threading and TorchScript inference — PyTorch 1.12 documentation the default number of threads used by pytorch for intra-op parallelism is the number of CPU cores. Since I don’t observe these numbers using htop (and other monitoring mechanisms) I wonder whether scripted models use a different default configuration or whether transformers models or specifically Roberta model is configured differently then pytorch’s defaults (e.g., set_num_threads or OMP_NUM_THREADS or MKL_NUM_THREADS is set somewhere).

Can someone please shed light on this for me? Any help would be greatly appreciated!

Topic		Replies	Views
CPU Optimization PyTorch Strategies Intermediate	1	600	February 1, 2022
How to Train Model Using CPU with MultiProcess Each With Some Number of Thread? 🤗Transformers	0	976	May 12, 2023
Dataset.map stuck with `torch.set_num_threads` set to 2 or larger Beginners	1	1658	May 2, 2023
High variability of CPU inference times Beginners	4	46	January 30, 2025
Problem with torch.multiprocessing and Roberta 🤗Transformers	2	2609	March 14, 2021

Number of Inter and Intra-ops threads used by BERT models

Related topics