🚀 Bringing Supercomputer-Grade AI Performance to Local CPUs: Purem Benchmarks Now Public

elegantly · April 28, 2025, 1:01pm

Hi everyone,

We are excited to announce that on April 28, 2025, we are publicly launching Purem – an innovative CPU-accelerated AI engine that achieves supercomputer-grade performance directly on local devices, such as MacBook (Apple M1–M4).

Purem’s softmax kernel reaches ~6500 ops/sec on Apple M2 CPUs, based on real large-scale benchmarks.
For comparison, industry solutions recalculated for CPU:
• OpenAI FlashAttention achieves ~800–5000 ops/sec on CPU.
• Meta’s Xformers (PyTorch 2.0) achieves ~1300–1500 ops/sec on CPU.

Purem brings industrial-grade AI performance without requiring GPUs, specialized cloud services, or complex hardware setups – running natively on consumer CPUs optimizations.

Benchmark Highlights

Fully local execution
No cloud dependencies
O(N) linear-time performance
Transparent memory management
Deterministic speed without drift

Purem enables developers and researchers to test, validate, and deploy AI workflows locally – with real production-grade speed.

Context and Motivation

Today, local CPU performance for AI models is often treated as secondary compared to GPU acceleration.
We believe this needs to change.
With the right architecture, CPUs can match – and sometimes even outperform – traditional GPU solutions, especially in the context of edge computing, early prototyping, and resource-efficient inference.

Purem is designed for:

ML researchers
AI startups
Edge AI applications
Cost-sensitive deployments
Local LLM fine-tuning and validation

We are opening the free version permanently, enabling unrestricted benchmarking and real-world local testing.

Discussion

What are your thoughts on bridging the gap between GPU and CPU compute for scalable AI workloads?
Do you see value in bringing production-grade AI execution directly onto consumer CPUs?

Would love to hear your feedback, insights, and suggestions!

(If you’re interested, we also included detailed benchmark data, architectural notes, and upcoming development plans at https://worktif.com).

TL;DR

Purem: Supercomputer-grade AI kernel running locally.
Softmax on CPU reaching ~6500 ops/sec.
Free demo goes live April 28.
Ready to unlock AI potential for everyone.

#cpu #optimization #ai #benchmark #acceleration

Topic		Replies	Views
Speed expectations for production BERT models on CPU vs GPU? Beginners	1	2153	October 2, 2020
MPS is running slower than CPU on Mac M1 Pro 🧨 Diffusers	6	6945	November 1, 2022
Running transformer models on mps instead of cpu on mac Beginners	1	1749	January 18, 2025
Running an LLM with high output quality locally Beginners	5	1666	February 22, 2025
[Deepspeed ZeRO-Infinity] looking for NVMe device benchmarks DeepSpeed	0	1187	April 26, 2021

🚀 Bringing Supercomputer-Grade AI Performance to Local CPUs: Purem Benchmarks Now Public

Benchmark Highlights

Context and Motivation

Discussion

TL;DR

Related topics