Proposal: Modular, Domain & Subdomain-Aware MoE for Mistral—Next Steps?

seth615 · July 10, 2025, 12:37am

MoE LLMs (like Mixtral) have set a new bar for efficient scaling. But all open MoEs route at the token level, with expert specialization emerging implicitly.

Recent research (TaskMoE, DomainMoE, THOR-MoE, GLaM) explores explicit routing by domain and even subdomain. This enables:

Targeted upgrades (swap in a better “math” or “literature” expert without retraining the whole model)

More interpretable model internals

Modularity that aligns with how orchestrators (AutoGen, CrewAI, MCP) are evolving

What might this look like for Mistral?

Expert groups per domain (English, math, code, etc.)

Hierarchies within domains (e.g., arithmetic → algebra → calculus), potentially with meta-experts that arbitrate or combine outputs

A possible “expert registry” for community or enterprise swapping/upgrading

This isn’t trivial. Some questions:

How should the gating and training be handled to avoid catastrophic forgetting or interface mismatch?

What’s the best way to benchmark performance of swapped modules?

Are there security or trust issues with open expert modules, and how do other plugin/package systems handle it?

Who’s working on this already? Any public code, experiments, or ideas?

Links:

TaskMoE: 

DomainMoE: 

THOR-MoE:

AutoGen: https://github.com/microsoft/autogen

CrewAI: https://github.com/joaomdmoura/crewAI

ModelContextProtocol: https://github.com/modelcontextprotocol/servers

Would love thoughts, critique, and collaboration. Is this plausible as the next step for Mixtral (or other open MoEs)? What would it take to make this real?

TL;DR
Is it time for modular, upgradeable, domain-aware MoE in open models like Mistral? What’s missing—and who’s already working on it?

Ernst03 · July 10, 2025, 1:36pm

First things first: Hello @seth615
Welcome to posting on HF.
Second Hey I love reading what people present. I presented already so you can check that out if you like. Now I shall read your post.

Topic		Replies	Views
Safe_Mode = [True, False] Community Calls	0	257	December 27, 2023
DeepSeek Architecture Series: MoE Implementation Show and Tell	0	129	February 28, 2025
Fine-tuning Mistral/Mixtral for sequence classification on long context Intermediate	2	2608	May 29, 2024
Paper Notes: Deepspeed Mixture of Experts Research	2	2206	January 20, 2022
Need Help! Open-Source models for funtion calling Models	0	479	January 2, 2024

Proposal: Modular, Domain & Subdomain-Aware MoE for Mistral—Next Steps?

Related topics