Mistral or LLaMA?

Which model is better for a chatbot fine-tuned on healthcare data?

  • Meta-Llama-3-8B-Instruct
  • Mistral-7B-Instruct-v0.2

We have been getting great results with Mistral and were about to initiate our final training, but now Meta has released this new version, and so I am hoping people can offer their two cents to aid our decision.

Thank you in advance!

2 Likes

In ditto position, how has your experience been with Mistral ?

1 Like

@singhay, Mistral has been really great.

We’ve been using Mistral-8x7B-Instruct-v0.1 model to preprocess our training samples through fireworks.ai API for $0.50/1M tokens and it’s been great. Cheaper than GPT-4, fireworks.ai is great, and it’s been great at formatting JSON.

We’ve then been taking our preprocessed samples and fine-tuning Mistral-7B-Instruct-v0.2 using together.ai, and our first two test trainings blew us away. We’re almost finished preprocessing our entire dataset and are about to fine-tune a model using 1M samples, so we’re really excited!

I plan on doing a smaller test fine-tuning with LLaMA 3 8B Instruct after we do our full training because I can’t see any major benefit from using it that justifies changing our strategy since we’re really focused on just releasing our version 1 model at this point. But we plan on doing some side-by-side tests using a smaller dataset with like 50K samples so we can consider using LLaMA for our version 2 model.

One thing I will say about Mistral is that they don’t have a designated syntax that I’m aware of for a system prompt, but we emulate one by including two messages (user and assistant) at the start of our messages array where roles and system-prompty stuff can be established, and it works great.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.