How to enable use of all cpu cores?

hoovesmckenzee · October 27, 2022, 3:39am

I currently have stable-diffusion-cpuonly version installed with learning pack 1.5 and have been noticing that it only uses at most 40%~ of the cpu and around 8-12gig of ram.

This system has 48 cores at 2.6ghz, 64gig ddr4 ecc ram,geforce 980GTX 4GB

Is there a way to configure this to use all cpu cores, or use n cores?

BasToTheMax · November 1, 2022, 4:55pm

I have the same problem. Have you found a solution?

hoovesmckenzee · November 1, 2022, 8:15pm

I have yet to figure it out I dont see anything in the code about cores or stuff, makes me wonder if the process itself isnt that parallelize-able or something. Out of 48 cores I use about 40% which is around 20 cores. So I dont think its a hyper threading issue or it would be capping out at 24 cores id assume. I assume not as many people are doing this as I would think. I have no serious reason to use this program mostly just experimenting and trying different stuff with txt2img and img2img, would be sweet to get it to use 100% though, perhaps somone more versed in it will chime in at some point ill keep watch on this thread.

Zelgodiz · April 16, 2025, 8:53pm

For PyTorch-Based Stable Diffusion (Most Common)

Put this at the top of your script:

python

CopyEdit

import torch
import os
import multiprocessing

# Use all logical CPU cores
num_cores = multiprocessing.cpu_count()
torch.set_num_threads(num_cores)
torch.set_num_interop_threads(max(1, num_cores // 2))  # Optional tuning

print(f"🔧 Using {num_cores} CPU threads for PyTorch")

This configures PyTorch to fully use the CPU for inference or training.

For Diffusers (Hugging Face’s `diffusers` library)

If you’re using from diffusers import StableDiffusionPipeline, you can combine this with:

python

CopyEdit

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cpu")  # ensure it's CPU-only

And then set the CPU usage as above with torch.set_num_threads(...).

For ONNX Runtime Backends (if used)

If you’re using ONNX to accelerate Stable Diffusion (common in onnxruntime CPU-optimized builds):

python

CopyEdit

import onnxruntime as ort

sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = os.cpu_count()  # Max parallelism
sess_options.inter_op_num_threads = max(1, os.cpu_count() // 2)

ort_session = ort.InferenceSession("model.onnx", sess_options)

Optional: Set Env Variables (Can Help PyTorch/ONNX)

Set before running your script:

bash

CopyEdit

export OMP_NUM_THREADS=$(nproc)
export MKL_NUM_THREADS=$(nproc)

Or in Python:

python

CopyEdit

os.environ["OMP_NUM_THREADS"] = str(num_cores)
os.environ["MKL_NUM_THREADS"] = str(num_cores)

Final Tip: Batch Your Requests

Stable Diffusion on CPU can also be made more efficient by batching — generating multiple images per pass (if memory allows):

python

CopyEdit

pipe(prompt="a futuristic AI core", num_images_per_prompt=4)

John6666 · April 17, 2025, 3:43am

If you want to use GPU models efficiently on a CPU, it is relatively easy to use ONNX (introduced above) or GGUF.

Topic		Replies	Views
How to Train Model Using CPU with MultiProcess Each With Some Number of Thread? 🤗Transformers	0	977	May 12, 2023
Accelerate doesn't seem to use my GPU? 🤗Accelerate	7	5713	September 18, 2024
How do you clear models from memory when using enable_cpu_model_offload 🧨 Diffusers	0	1420	August 8, 2023
Reduce number of cores Beginners	1	426	August 25, 2021
Multi node CPU to train transformer GPT-JT-6B-v1 (moved) 🤗Transformers	0	422	February 20, 2023

How to enable use of all cpu cores?

For PyTorch-Based Stable Diffusion (Most Common)

For Diffusers (Hugging Face’s diffusers library)

For ONNX Runtime Backends (if used)

Optional: Set Env Variables (Can Help PyTorch/ONNX)

Final Tip: Batch Your Requests

Related topics

For Diffusers (Hugging Face’s `diffusers` library)