What’s the best open-source model to generate safe, structured, and empathetic rehab prompts for Parkinson’s patients in a LangChain-based PT coaching agent (starting with Mistral, Zephyr, or something healthcare-tuned)?

Hi everyone,
I’m building a HIPAA-ready agentic AI platform to support physical therapists and their patients between sessions — starting with Parkinson’s Disease rehab. The idea is to simulate a pluridisciplinary team where each agent (PT, SLP, neuro, etc.) delivers adaptive, natural-sounding coaching.

I’m currently designing the PT Agent, which focuses on:

  • Evidence-based cueing (balance, resistance training, fall prevention)
  • Safe, structured output (never hallucinate or recommend risk-prone activities)
  • Emotionally supportive tone for older adults
  • Context-aware personalization over time

I’ve already structured the prompts and sessions based on APTA and literature guidelines, and I’m routing logic via LangChain. I plan to fine-tune or instruct-tune if needed.

Question:
Given this use case, which open-source model would you recommend for generating short, safe, structured and empathetic rehab messages?
Would you lean toward:

  1. Mistral-7B-Instruct or Zephyr for balanced cost vs coherence?
  2. T5 or LLaMA 2 if aiming for lightweight + tight control?
  3. A smaller finetuned model (e.g., MedAlpaca, or BioGPT) for healthcare tone?

I’d love advice from those who’ve done health-aligned generation, especially if you’ve worked with safety filters, few-shot rehab tasks, or patient-facing chat.

Thanks in advance!
Alexia, Physical Therapist and Founder at loyalbee.ai

1 Like

1 & 3

I think Zephyr is better, but those models were mainstream a year ago. Rapid progress has been made, and now models that are about half the size can achieve similar performance. The surest way to find promising models is to look at leaderboards and the like.

2

I think many people use a mechanism called RAG or Agent (langchain is a pioneer in this field), which combines lightweight models such as T5 with LLM and several programs. This is effective because it is easy to control output and often handles data accurately (similar to LLM responding with a dictionary at hand), making it particularly useful for consumer-level hardware. There are also many educational resources available on Hugging Face.

Some of the Hugging Face staff and people working on the Hub are doctors, so asking them for expert support might be a good idea. If possible, asking someone who knows is almost always the best option…