I made some open source software to run UNQUANTIZED Mistral 7b-Instruct on about 2GB of RAM

Yes, this is actually real.
You can now run unquantized Mistral-7B-Instruct on about 2gb of ram.

First off, here is the github link: GitHub - garagesteve1155/ai_splitter: Run Mistral 7b Instruct on less than 4gb of RAM

Here is a video explanation of how it works: https://youtu.be/53Do4whfrqE?si=yMmy5iavmzw_qNgv

Most solutions (Ollama, llama.cpp, etc.) require quantization to get large models working on small hardware. This one doesn’t.

ai_splitter uses a simple but effective layer-by-layer loading strategy, keeping memory usage low (~2GB), while still using the full FP16 .safetensors weights from Hugging Face.

Why it matters:

No GPU needed

No quantization

Works on super low-end hardware like Pi 5 or mini PCs

Only requires ~2GB RAM

CPU-only, pure Python

More supported models are coming soon.

Yes, it’s SLOW!!! Hahaha… but it works. Bout to boot up my pi lol

2 Likes

Hello, I need to use residential proxies to help me collect and crawl data when building AI models. Do you have any good suggestions?

1 Like