Best latest baseline cardiology model available today

hello all, I would like to start a small use case in the Cardiology area. which is the best base model I can start with.

1 Like

Since the topic relates to the medical field, I recommend also asking your question on the Hugging Science Discord. (Detailed version)


As of March 18, 2026, the best baseline to start a small cardiology use case with is MedGemma 1.5 4B Instruct. Google released MedGemma 1.5 in January 2026 as the updated medical model in the MedGemma line, and its Hugging Face collection was updated 6 days ago, so it is still the most current major open medical foundation-model option in this family. (Google Research)

The direct recommendation

Use google/medgemma-1.5-4b-it as your starting point. It is the best first baseline because it is a medical-native, current, compute-efficient, multimodal model that was explicitly positioned as an adaptable starting point for healthcare developers, not as a fixed demo model. Google’s own model documentation says MedGemma 1.5 4B is meant to be a compute-efficient starting point, small enough to run offline, and that developers are expected to fine-tune it for their specific use case. (Google Research)

Why this is the best first baseline

Cardiology is not one task. A “cardiology model” may need to handle reports, EHR text, guidelines, patient instructions, scanned documents, ECG images, or raw ECG signals. Broad reviews of clinical LLM use keep finding that there is no single generalist model that works best across a wide range of clinical tasks, and real deployment usually needs task-specific adaptation. That is why the best first choice is not the most exotic cardiology-specific model. It is the strongest practical medical foundation model that can be adapted cleanly. (JMIR)

MedGemma 1.5 4B fits that role unusually well. Its published capabilities include medical document understanding, EHR understanding, improved medical text reasoning, and multimodal medical comprehension. Those are exactly the abilities that matter for many early cardiology projects such as discharge-summary QA, cardiology report extraction, registry abstraction, referral triage, guideline lookup, and scanned-note understanding. (Google for Developers)

Its benchmark profile also looks like a good baseline profile rather than a narrow demo profile. On Google’s published evaluations, MedGemma 1.5 4B improved over MedGemma 1 4B on MedQA and MedMCQA, reached 89.6 on EHRQA, and showed strong document extraction results on raw PDF-to-JSON medical-report tasks, including 91.0 macro F1 on one internal raw-PDF evaluation. Those numbers do not mean it is “solved.” They do mean it is already strong in the exact workflow categories where small healthcare pilots usually begin. (Hugging Face)

Why not start with a larger model first

If you only care about text and have enough compute, MedGemma 27B text-only is stronger on text benchmarks. Google’s model card says it is trained exclusively on medical text, and its published numbers beat the 4B model on text tasks such as MedQA and EHRNoteQA. But that is not the same as being the best baseline. A baseline is supposed to be fast to test, cheap to adapt, and broad enough for the first iteration. The 27B model is better viewed as the upgrade path after the first prototype works. (Hugging Face)

So the rule is simple:

  • Best first baseline for most small cardiology projects: MedGemma 1.5 4B Instruct. (Hugging Face)
  • Best next step if the project is purely text and you want more accuracy: MedGemma 27B text-only. (Hugging Face)

The one big exception

If your cardiology use case is raw ECG waveform modeling, then MedGemma is not the right first choice. In that case, start with ECGFounder instead. ECGFounder was trained on over 10 million ECGs with 150 label categories, was designed to work both out of the box and through downstream fine-tuning, and was built specifically for ECG analysis across multiple domains, including lower-rank and single-lead settings. For ECG classification, arrhythmia detection, or signal-level transfer learning, that is the more appropriate baseline. (arXiv)

What this means in plain English

If your first cardiology project looks like any of these:

  • extract structured fields from cardiology reports,
  • answer questions from cardiology notes or discharge summaries,
  • build a guideline-grounded assistant,
  • summarize or rewrite cardiology documentation,
  • handle mixed text-plus-document inputs,

then start with MedGemma 1.5 4B Instruct. Its shape matches those tasks well, and recent cardiology RAG work also shows that strong retrieval plus a good medical base model is a very effective pattern for cardiology knowledge tasks. (Hugging Face)

If your first project is instead:

  • ECG waveform classification,
  • signal-level prediction,
  • wearable/single-lead ECG transfer,
  • ECG representation learning,

then start with ECGFounder. (arXiv)

Final answer

If you want one model name to begin with, use:

MedGemma 1.5 4B Instruct

That is the best latest practical baseline for a small cardiology use case today because it is current, medical, multimodal, efficient, and intended to be adapted to real healthcare tasks. If your project is specifically raw ECG, switch to ECGFounder immediately. (Google Research)

1 Like

Thanks John. I will download this and let you know my results. Also, I would like to know if there other models from Llama or mistrial which are specific to medical field?

1 Like

other models from Llama or mistrial which are specific to medical field?

Yeah. It’s safest to search using the leaderboard, but it’s quicker to search for Hubs using the keyword med.


Here is the ranked shortlist of medical Llama- and Mistral-based models on Hugging Face that are worth knowing first.

I am ranking for practical starting value, not just novelty. That means I am favoring models that look like real baselines, have usable model cards, and are broad enough for general medical or cardiology-adjacent work. I am not counting quantizations, one-off adapters, or tiny community forks as separate core entries. (Hugging Face)

Start here first

1. m42-health/Llama3-Med42-8B

This is the best Llama-based general medical baseline in the shortlist. The card says Med42-v2 is a suite of clinically aligned Llama-3 models in 8B and 70B, trained on about 1B tokens, with intended uses including medical QA and patient-record summarization. If you want one broad medical Llama model to test first, this is the cleanest starting point. The 70B version is stronger, but the 8B model is the more practical baseline. (Hugging Face)

2. HPAI-BSC/Llama3.1-Aloe-Beta-8B

This is the strongest newer Llama-based research alternative in the shortlist. The card says Aloe is trained on 20 medical tasks, that the Beta release is the latest Aloe iteration, and that the 8B Beta expanded training to 1.8B tokens across more task types such as summarization, diagnosis, classification, and treatment recommendation. The main drawback is licensing: Aloe modifications are under CC-BY-NC-4.0, so it is non-commercial unless that fits your use case. (Hugging Face)

3. BioMistral/BioMistral-7B

This is the best Mistral-based medical text baseline here. The card says BioMistral is an Apache-2.0 open-source medical model built on Mistral-7B-Instruct-v0.1, further pre-trained on PubMed Central, and evaluated on 10 established medical QA tasks in English. If you specifically want a Mistral-family medical checkpoint, this is the first one to try. The caution is also explicit in the card: it is positioned as a research tool, not a clinically validated deployment model. (Hugging Face)

4. johnsnowlabs/JSL-MedLlama-3-8B-v2.0

This is a solid Llama-based medical text model with visible benchmark numbers on the card, including MedMCQA, MedQA, PubMedQA, and MMLU medical subsets. I rank it below Med42 and Aloe because the license is more restrictive, CC-BY-NC-ND-4.0, and the card is lighter on training details. Still, it is a legitimate model, not a throwaway community fine-tune. (Hugging Face)

5. dmis-lab/meerkat-7b-v1.0

This is the most interesting reasoning-focused Mistral-based medical model in the shortlist. The card says it is based on Mistral-7B-v0.1, trained on synthetic chain-of-thought data derived from 18 medical textbooks, and claims to be the first 7B medical model to exceed the 60% USMLE passing threshold. I would test it when you care about exam-style reasoning or case-style dialogue more than broad production fit. (Hugging Face)

Test later or only if the task matches

6. UMCU/CardioLlama.nl_clinical

This is the most clearly cardiology-specific Llama model I could verify, but it is also very narrow. The Hugging Face material shows it is based on Llama-3.2-1B-Instruct, domain-adapted on a Dutch medical corpus, then further pre-trained on 5 million cardiology records mixed with broader Dutch medical text. It was also updated October 29, 2025. I would only move this high in the ranking if your work is specifically Dutch cardiology text. For English general medical work, it is too specialized and too small to be the first baseline. (Hugging Face)

7. microsoft/llava-med-v1.5-mistral-7b

This is the best Mistral-based biomedical vision-language model in the shortlist, not the best general medical text model. The card says it uses Mistral-7B-Instruct-v0.2, was trained in April 2024, builds on the PMC-15M biomedical image-text dataset, and is intended for research use only, not clinical care or deployed use. Use it for biomedical VQA and image-text experiments, not as a general cardiology text baseline. (Hugging Face)

8. ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1

This is a multimodal Llama-3 medical model trained on a custom biomedical text-and-image dataset with 500,000+ entries. The card positions it for biomedical research, education, and decision-support-style use cases, but it also uses a non-commercial custom license and the validation story is lighter than the higher-ranked entries. I would treat it as a niche multimodal experiment, not the first broad medical baseline. (Hugging Face)

The simple decision rule

If you want the cleanest answer:

  • Best Llama medical model to start with: Med42-v2 8B. (Hugging Face)
  • Best newer Llama research alternative: Llama3.1-Aloe-Beta-8B. (Hugging Face)
  • Best Mistral medical text model: BioMistral-7B. (Hugging Face)
  • Best Mistral medical multimodal model: LLaVA-Med v1.5 Mistral-7B, but only for research. (Hugging Face)
  • Best cardiology-specific Llama niche model: CardioLlama.nl_clinical, only if your domain is Dutch cardiology text. (Hugging Face)

My practical recommendation

For a general medical or cardiology-adjacent text project, I would test them in this order:

  1. m42-health/Llama3-Med42-8B
  2. HPAI-BSC/Llama3.1-Aloe-Beta-8B
  3. BioMistral/BioMistral-7B
  4. johnsnowlabs/JSL-MedLlama-3-8B-v2.0
  5. dmis-lab/meerkat-7b-v1.0 (Hugging Face)

For a cardiology-specific direction, I would still start from one of the broad medical baselines above unless you specifically need Dutch cardiology, in which case CardioLlama becomes much more relevant. (Hugging Face)