I currently have stable-diffusion-cpuonly version installed with learning pack 1.5 and have been noticing that it only uses at most 40%~ of the cpu and around 8-12gig of ram.
This system has 48 cores at 2.6ghz, 64gig ddr4 ecc ram,geforce 980GTX 4GB
Is there a way to configure this to use all cpu cores, or use n cores?
2 Likes
I have the same problem. Have you found a solution?
1 Like
I have yet to figure it out I dont see anything in the code about cores or stuff, makes me wonder if the process itself isnt that parallelize-able or something. Out of 48 cores I use about 40% which is around 20 cores. So I dont think its a hyper threading issue or it would be capping out at 24 cores id assume. I assume not as many people are doing this as I would think. I have no serious reason to use this program mostly just experimenting and trying different stuff with txt2img and img2img, would be sweet to get it to use 100% though, perhaps somone more versed in it will chime in at some point ill keep watch on this thread.
1 Like
For PyTorch-Based Stable Diffusion (Most Common)
Put this at the top of your script:
python
CopyEdit
import torch
import os
import multiprocessing
# Use all logical CPU cores
num_cores = multiprocessing.cpu_count()
torch.set_num_threads(num_cores)
torch.set_num_interop_threads(max(1, num_cores // 2)) # Optional tuning
print(f"🔧 Using {num_cores} CPU threads for PyTorch")
This configures PyTorch to fully use the CPU for inference or training.
For Diffusers (Hugging Face’s diffusers library)
If you’re using from diffusers import StableDiffusionPipeline, you can combine this with:
python
CopyEdit
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cpu") # ensure it's CPU-only
And then set the CPU usage as above with torch.set_num_threads(...).
For ONNX Runtime Backends (if used)
If you’re using ONNX to accelerate Stable Diffusion (common in onnxruntime CPU-optimized builds):
python
CopyEdit
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = os.cpu_count() # Max parallelism
sess_options.inter_op_num_threads = max(1, os.cpu_count() // 2)
ort_session = ort.InferenceSession("model.onnx", sess_options)
Optional: Set Env Variables (Can Help PyTorch/ONNX)
Set before running your script:
bash
CopyEdit
export OMP_NUM_THREADS=$(nproc)
export MKL_NUM_THREADS=$(nproc)
Or in Python:
python
CopyEdit
os.environ["OMP_NUM_THREADS"] = str(num_cores)
os.environ["MKL_NUM_THREADS"] = str(num_cores)
Final Tip: Batch Your Requests
Stable Diffusion on CPU can also be made more efficient by batching — generating multiple images per pass (if memory allows):
python
CopyEdit
pipe(prompt="a futuristic AI core", num_images_per_prompt=4)
1 Like
If you want to use GPU models efficiently on a CPU, it is relatively easy to use ONNX (introduced above) or GGUF.