I made some open source software to run UNQUANTIZED Mistral 7b-Instruct on about 2GB of RAM

MakingMadeEasy · April 15, 2025, 9:56pm

Yes, this is actually real.
You can now run unquantized Mistral-7B-Instruct on about 2gb of ram.

First off, here is the github link: GitHub - garagesteve1155/ai_splitter: Run Mistral 7b Instruct on less than 4gb of RAM

Here is a video explanation of how it works: https://youtu.be/53Do4whfrqE?si=yMmy5iavmzw_qNgv

Most solutions (Ollama, llama.cpp, etc.) require quantization to get large models working on small hardware. This one doesn’t.

ai_splitter uses a simple but effective layer-by-layer loading strategy, keeping memory usage low (~2GB), while still using the full FP16 .safetensors weights from Hugging Face.

Why it matters:

No GPU needed

No quantization

Works on super low-end hardware like Pi 5 or mini PCs

Only requires ~2GB RAM

CPU-only, pure Python

More supported models are coming soon.

Yes, it’s SLOW!!! Hahaha… but it works. Bout to boot up my pi lol

panghu01 · April 16, 2025, 9:25am

Hello, I need to use residential proxies to help me collect and crawl data when building AI models. Do you have any good suggestions?

Topic		Replies	Views
Best LLMs that can run on 4gb VRAM Beginners	2	3089	January 22, 2025
Recommended hardware for running LLMs locally Beginners	2	33085	December 18, 2023
An interesting project. Just looking for advice Beginners	1	28	July 7, 2025
Fine-tune a 7B parameter LLM efficiently and affordably? Models	2	784	August 26, 2024
How to run large LLMs like Llama 3.1 70B or Mixtral 8x22B with limited GPU VRAM? Beginners	2	1657	September 26, 2024

I made some open source software to run UNQUANTIZED Mistral 7b-Instruct on about 2GB of RAM

Related topics