How to enable use of all cpu cores?

I currently have stable-diffusion-cpuonly version installed with learning pack 1.5 and have been noticing that it only uses at most 40%~ of the cpu and around 8-12gig of ram.

This system has 48 cores at 2.6ghz, 64gig ddr4 ecc ram,geforce 980GTX 4GB

Is there a way to configure this to use all cpu cores, or use n cores?

2 Likes

I have the same problem. Have you found a solution?

1 Like

I have yet to figure it out I dont see anything in the code about cores or stuff, makes me wonder if the process itself isnt that parallelize-able or something. Out of 48 cores I use about 40% which is around 20 cores. So I dont think its a hyper threading issue or it would be capping out at 24 cores id assume. I assume not as many people are doing this as I would think. I have no serious reason to use this program mostly just experimenting and trying different stuff with txt2img and img2img, would be sweet to get it to use 100% though, perhaps somone more versed in it will chime in at some point ill keep watch on this thread.

1 Like

:white_check_mark: For PyTorch-Based Stable Diffusion (Most Common)

Put this at the top of your script:

python

CopyEdit

import torch
import os
import multiprocessing

# Use all logical CPU cores
num_cores = multiprocessing.cpu_count()
torch.set_num_threads(num_cores)
torch.set_num_interop_threads(max(1, num_cores // 2))  # Optional tuning

print(f"🔧 Using {num_cores} CPU threads for PyTorch")

This configures PyTorch to fully use the CPU for inference or training.


:white_check_mark: For Diffusers (Hugging Face’s diffusers library)

If you’re using from diffusers import StableDiffusionPipeline, you can combine this with:

python

CopyEdit

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cpu")  # ensure it's CPU-only

And then set the CPU usage as above with torch.set_num_threads(...).


:white_check_mark: For ONNX Runtime Backends (if used)

If you’re using ONNX to accelerate Stable Diffusion (common in onnxruntime CPU-optimized builds):

python

CopyEdit

import onnxruntime as ort

sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = os.cpu_count()  # Max parallelism
sess_options.inter_op_num_threads = max(1, os.cpu_count() // 2)

ort_session = ort.InferenceSession("model.onnx", sess_options)

:gear: Optional: Set Env Variables (Can Help PyTorch/ONNX)

Set before running your script:

bash

CopyEdit

export OMP_NUM_THREADS=$(nproc)
export MKL_NUM_THREADS=$(nproc)

Or in Python:

python

CopyEdit

os.environ["OMP_NUM_THREADS"] = str(num_cores)
os.environ["MKL_NUM_THREADS"] = str(num_cores)

:test_tube: Final Tip: Batch Your Requests

Stable Diffusion on CPU can also be made more efficient by batching — generating multiple images per pass (if memory allows):

python

CopyEdit

pipe(prompt="a futuristic AI core", num_images_per_prompt=4)
1 Like

If you want to use GPU models efficiently on a CPU, it is relatively easy to use ONNX (introduced above) or GGUF.