Best Agent Architecture for Conversational Chatbot Using Remote MCP Tools

HailHydra863 · November 10, 2025, 7:00pm

Hi everyone,

I’m working on a personal project - building a conversational chatbot that solves user queries using tools hosted on a remote MCP (Model Context Protocol) server. I could really use some advice or suggestions on improving the agent architecture for better accuracy and efficiency.

Project Overview

The MCP server hosts a set of tools (essentially APIs) that my chatbot can invoke.
Each tool is independent, but in many scenarios, the output of one tool becomes the input to another.
The chatbot should handle:
- Simple queries requiring a single tool call.
- Complex queries requiring multiple tools invoked in the right order.
- Ambiguous queries, where it must ask clarifying questions before proceeding.

What I’ve Tried So Far

1. Simple ReAct Agent

A basic loop: tool selection → tool call → final text response.
Worked fine for single-tool queries.
Failed/ Hallucinates tool inputs for many scenarios where mutiple tool call in the right order is required.
Fails to ask clarifying questions whenever required.

2. Planner–Executor–Replanner Agent

The Planner generates a full execution plan (tool sequence + clarifying questions).
The Executor (a ReAct agent) executes each step using available tools.
The Replanner monitors execution, updates the plan dynamically if something changes.

Pros: Significantly improved accuracy for complex tasks.
Cons: Latency became a big issue — responses took 15s–60s per turn, which kills conversational flow.

Performance Benchmark

To compare, I tried the same MCP tools with Claude Desktop, and it was impressive:

Accurately planned and executed tool calls in order.
Asked clarifying questions proactively.
Response time: ~2–3 seconds. That’s exactly the kind of balance between accuracy and speed I want.

What I’m Looking For

I’d love to hear from folks who’ve experimented with:

Alternative agent architectures (beyond ReAct and Planner-Executor).
Ideas for reducing latency while maintaining reasoning quality.
Caching, parallel tool execution, or lightweight planning approaches.
Ways to replicate Claude’s behavior using open-source models (I’m constrained to Mistral, LLaMA, GPT-OSS).

Lastly,
I realize Claude models are much stronger compared to current open-source LLMs, but I’m curious about how Claude achieves such fluid tool use.

Is it primarily due to their highly optimized system prompts and fine-tuned model behavior?
Are they using some form of internal agent architecture or workflow orchestration under the hood (like a hidden planner/executor system)?

If it’s mostly prompt engineering and model alignment, maybe I can replicate some of that behavior with smart system prompts. But if it’s an underlying multi-agent orchestration, I’d love to know how others have recreated that with open-source frameworks.

John6666 · November 11, 2025, 3:16pm

I’m not familiar with MCP, but the information I found through a search looks like this.

Topic		Replies	Views
[Not working] QA inference API and conv-ai Models	9	886	February 16, 2021
GPT2 chat-bot single interaction… Attribute Error: 'NoneType' object has no attribute 'multiprocessing_chunksize' 🤗Transformers	0	279	September 9, 2021
Hierarchical planning agent Models	1	702	July 31, 2023
Huggingface endpoint with chat agents for conversational NLU Intermediate	0	190	February 1, 2024
Multi-turn dialogue using dialoGPT with Hosted Inference API Beginners	3	1097	July 31, 2020

Best Agent Architecture for Conversational Chatbot Using Remote MCP Tools

Project Overview

What I’ve Tried So Far

Performance Benchmark

What I’m Looking For

Related topics