Proposal: Modular, Domain & Subdomain-Aware MoE for Mistral—Next Steps?

MoE LLMs (like Mixtral) have set a new bar for efficient scaling. But all open MoEs route at the token level, with expert specialization emerging implicitly.

Recent research (TaskMoE, DomainMoE, THOR-MoE, GLaM) explores explicit routing by domain and even subdomain. This enables:

Targeted upgrades (swap in a better “math” or “literature” expert without retraining the whole model)

More interpretable model internals

Modularity that aligns with how orchestrators (AutoGen, CrewAI, MCP) are evolving

What might this look like for Mistral?

Expert groups per domain (English, math, code, etc.)

Hierarchies within domains (e.g., arithmetic → algebra → calculus), potentially with meta-experts that arbitrate or combine outputs

A possible “expert registry” for community or enterprise swapping/upgrading

This isn’t trivial. Some questions:

How should the gating and training be handled to avoid catastrophic forgetting or interface mismatch?

What’s the best way to benchmark performance of swapped modules?

Are there security or trust issues with open expert modules, and how do other plugin/package systems handle it?

Who’s working on this already? Any public code, experiments, or ideas?

Links:

TaskMoE: 

DomainMoE: 

THOR-MoE:

AutoGen: https://github.com/microsoft/autogen

CrewAI: https://github.com/joaomdmoura/crewAI

ModelContextProtocol: https://github.com/modelcontextprotocol/servers

Would love thoughts, critique, and collaboration. Is this plausible as the next step for Mixtral (or other open MoEs)? What would it take to make this real?

TL;DR
Is it time for modular, upgradeable, domain-aware MoE in open models like Mistral? What’s missing—and who’s already working on it?

1 Like

First things first: Hello @seth615
Welcome to posting on HF.
Second Hey I love reading what people present. I presented already so you can check that out if you like. Now I shall read your post.