Number of Inter and Intra-ops threads used by BERT models

Hi everyone,
I’m trying to use Roberta model for inference on CPU in production environment. The model is trained in python and then exported to a TorchScript model for inference in Java (using the libtorch library). During Inference I see that multiple threads and cores are being utilized. when running on a machine with 40 cores and running htop I noticed that 21 threads are being utilized and the CPU utilization is at around 1700%. Also running on a 16 core machine I get around 450% CPU utilization. According to this doc: CPU threading and TorchScript inference — PyTorch 1.12 documentation the default number of threads used by pytorch for intra-op parallelism is the number of CPU cores. Since I don’t observe these numbers using htop (and other monitoring mechanisms) I wonder whether scripted models use a different default configuration or whether transformers models or specifically Roberta model is configured differently then pytorch’s defaults (e.g., set_num_threads or OMP_NUM_THREADS or MKL_NUM_THREADS is set somewhere).

Can someone please shed light on this for me? Any help would be greatly appreciated!