I currently have stable-diffusion-cpuonly version installed with learning pack 1.5 and have been noticing that it only uses at most 40%~ of the cpu and around 8-12gig of ram.
This system has 48 cores at 2.6ghz, 64gig ddr4 ecc ram,geforce 980GTX 4GB
Is there a way to configure this to use all cpu cores, or use n cores?
2 Likes
I have the same problem. Have you found a solution?
1 Like
I have yet to figure it out I dont see anything in the code about cores or stuff, makes me wonder if the process itself isnt that parallelize-able or something. Out of 48 cores I use about 40% which is around 20 cores. So I dont think its a hyper threading issue or it would be capping out at 24 cores id assume. I assume not as many people are doing this as I would think. I have no serious reason to use this program mostly just experimenting and trying different stuff with txt2img and img2img, would be sweet to get it to use 100% though, perhaps somone more versed in it will chime in at some point ill keep watch on this thread.
1 Like
For PyTorch-Based Stable Diffusion (Most Common)
Put this at the top of your script:
python
CopyEdit
import torch
import os
import multiprocessing
# Use all logical CPU cores
num_cores = multiprocessing.cpu_count()
torch.set_num_threads(num_cores)
torch.set_num_interop_threads(max(1, num_cores // 2)) # Optional tuning
print(f"🔧 Using {num_cores} CPU threads for PyTorch")
This configures PyTorch to fully use the CPU for inference or training.
For Diffusers (Hugging Face’s diffusers
library)
If you’re using from diffusers import StableDiffusionPipeline
, you can combine this with:
python
CopyEdit
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cpu") # ensure it's CPU-only
And then set the CPU usage as above with torch.set_num_threads(...)
.
For ONNX Runtime Backends (if used)
If you’re using ONNX to accelerate Stable Diffusion (common in onnxruntime
CPU-optimized builds):
python
CopyEdit
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = os.cpu_count() # Max parallelism
sess_options.inter_op_num_threads = max(1, os.cpu_count() // 2)
ort_session = ort.InferenceSession("model.onnx", sess_options)
Optional: Set Env Variables (Can Help PyTorch/ONNX)
Set before running your script:
bash
CopyEdit
export OMP_NUM_THREADS=$(nproc)
export MKL_NUM_THREADS=$(nproc)
Or in Python:
python
CopyEdit
os.environ["OMP_NUM_THREADS"] = str(num_cores)
os.environ["MKL_NUM_THREADS"] = str(num_cores)
Final Tip: Batch Your Requests
Stable Diffusion on CPU can also be made more efficient by batching — generating multiple images per pass (if memory allows):
python
CopyEdit
pipe(prompt="a futuristic AI core", num_images_per_prompt=4)
1 Like
If you want to use GPU models efficiently on a CPU, it is relatively easy to use ONNX (introduced above) or GGUF.