With the release of Claude Opus 4.5, I wanted to understand how it behaves compared to GPT-5.1 and Gemini 3 inside an identical RAG pipeline.
Here’s what I found:
-
Opus is far more structured than Gemini
-
More coherent than GPT 5.1, which tends to add extra “helpful” details
-
And importantly: it provided the cleanest reasoning across all models tested
RAG systems fail on drifting, over-extraction, and messy reasoning and Opus seemed to handle those cases more reliably than the others.
Would love to discuss if anyone also saw same patterns.
