Hi everyone,
I’m working on a personal project - building a conversational chatbot that solves user queries using tools hosted on a remote MCP (Model Context Protocol) server. I could really use some advice or suggestions on improving the agent architecture for better accuracy and efficiency.
Project Overview
-
The MCP server hosts a set of tools (essentially APIs) that my chatbot can invoke.
-
Each tool is independent, but in many scenarios, the output of one tool becomes the input to another.
-
The chatbot should handle:
-
Simple queries requiring a single tool call.
-
Complex queries requiring multiple tools invoked in the right order.
-
Ambiguous queries, where it must ask clarifying questions before proceeding.
-
What I’ve Tried So Far
1. Simple ReAct Agent
-
A basic loop: tool selection → tool call → final text response.
-
Worked fine for single-tool queries.
-
Failed/ Hallucinates tool inputs for many scenarios where mutiple tool call in the right order is required.
-
Fails to ask clarifying questions whenever required.
2. Planner–Executor–Replanner Agent
-
The Planner generates a full execution plan (tool sequence + clarifying questions).
-
The Executor (a ReAct agent) executes each step using available tools.
-
The Replanner monitors execution, updates the plan dynamically if something changes.
Pros: Significantly improved accuracy for complex tasks.
Cons: Latency became a big issue — responses took 15s–60s per turn, which kills conversational flow.
Performance Benchmark
To compare, I tried the same MCP tools with Claude Desktop, and it was impressive:
-
Accurately planned and executed tool calls in order.
-
Asked clarifying questions proactively.
-
Response time: ~2–3 seconds. That’s exactly the kind of balance between accuracy and speed I want.
What I’m Looking For
I’d love to hear from folks who’ve experimented with:
-
Alternative agent architectures (beyond ReAct and Planner-Executor).
-
Ideas for reducing latency while maintaining reasoning quality.
-
Caching, parallel tool execution, or lightweight planning approaches.
-
Ways to replicate Claude’s behavior using open-source models (I’m constrained to Mistral, LLaMA, GPT-OSS).
Lastly,
I realize Claude models are much stronger compared to current open-source LLMs, but I’m curious about how Claude achieves such fluid tool use.
- Is it primarily due to their highly optimized system prompts and fine-tuned model behavior?
- Are they using some form of internal agent architecture or workflow orchestration under the hood (like a hidden planner/executor system)?
If it’s mostly prompt engineering and model alignment, maybe I can replicate some of that behavior with smart system prompts. But if it’s an underlying multi-agent orchestration, I’d love to know how others have recreated that with open-source frameworks.