Mistral-7B-Instruct-v0.3 vs Mistral-NEMO-12B

Mistral-NEMO-12B is supposed to be better than Mistral-7B-Instruct-v0.3, the former being of higher parameters (12B). However, I used Mistral-7B-Instruct-v0.3 to implement a RAG system. I deployed the same code on both Render and Vercel. When the NEMO came out, I changed the model of the one on Render to NEMO but left the one on Vercel unchanged. I then watched the performances of the two applications with the same code base except for the models. The one with NEMO fails often. It is intolerant to even grammatical or punctuation errors. It often reads a literal interpretation to a prompt. For instance, the follow up prompt “Quote source” (i.e. it should quote the source of its last response) may return “Source”. However, 7B-Instruct excels in all these situations.

2 Likes

hi @Ade-crown
Could it be about tokenization or embedding issue?

I changed the model of the one on Render to NEMO

Did you just change the model? Or did you re-run all procedures(chunking, indexing, etc…) for new system?

I just changed the model, and nothing else. I don’t think there’s a need to tamper with the database since the model embedding queries for the database is a separate model entirely. The only purpose of Mistral models in the process is to use the context provided them to give intelligent response to the user’s queries.

1 Like

Can confirm, Mistral 7b is better.

I always try the newer small models when they come out, thinking I will be able to replace Mistral 7b - tried Llama 3, DeepSeek R1, etc. and several models between 2 and 13b param… Mistral 7b is always the best.

I’ve even compared Mistral 7b against other models by Mistral, and 7b still outperforms - and it’s fast!

I too have experienced the issue of other models hanging, or obviously outputting erroneous responses.

As somebody who heavily relies on text-to-text for a business use case (invoice automation), nothing is as solid or as good as Mistral 7b.

The only text-to-text models I can say honestly are “better” those being deployed by Google for their AI overview, or the ones in production at OpenAI. Those models are of course the best ones - but who knows what the cost is.

Mistral 7b continues to be the best choice in 2025 for most use cases.

1 Like

Really? Would you say that the quantization plays a role, or did you keep it controlled during tests against larger mistral models? And do you use it raw for the tasks? Or finetuned? Of course a finetuned smaller model will be a better fit for a specific task than a generalized larger one. So i am considering if my dataset i just published will do better with mistral 7b or magistral 24b?

I mean, is the leap worth it? Or as long as both are fine tuned, the results will be similar?

1 Like