Best Small LLM For Rag

Siyam · March 4, 2025, 9:22am

Which Is the best small model (3b) for rag, I am building a rag and using mistral-nemo 12b for it, i have tested many other model but not getting expected output like mistral nemo providing, nemo exactly follow system prompt but i can’t find any 3b model which exactly follow my system prompt, its normal that nemo is 12b model so it works better than any 3b model, but in my case i don’t want my model to have a large knowledge base outside my domain (200 pdf’s), and i want it to be super fast …
i am currently using Ollama ,please suggest the best 2-4b model for rag , smaller is better

John6666 · March 4, 2025, 10:14am

If you want a model that is as versatile as possible in that size range, I recommend these models.

anon69948774 · March 13, 2025, 12:02pm

Among the 7 or 8B models, Ministral instruct 2410 GGUF is the best for me in french (IQ4 XS is small), so it’s probably also the best among the 3Bs.

For local PDF GPT4all is interesting, LocalDocs is efficient.

Akjava · March 13, 2025, 12:27pm

I’m using granite3.2:8b for rag.granite3.2:2b is good as 2B.
but I’m not sure if the model can understand the system prompt provided by you.

John6666 · March 13, 2025, 2:10pm

Gemma 3 has been released, and 4B and 1B are in the lineup.
This seems to fit this use case.

Topic		Replies	Views
Looking for a Tiny LLM (max 1.5GB) – Need Advice Models	6	7843	December 6, 2024
Best LLMs that can run on 4gb VRAM Beginners	2	3089	January 22, 2025
Best model for file scan and personality Models	1	84	March 14, 2025
Best Open Source Models for English to Japanese Models	5	155	June 24, 2025
Lama 3.23b performs great when I download and use using ollama but when I manually download the model or if I use the gguf model by unsloth, it gives me irrelevant response. Please help me out Beginners	9	1350	October 31, 2024

Best Small LLM For Rag

Related topics