Opus 4.5 being the new best model for RAG

With the release of Claude Opus 4.5, I wanted to understand how it behaves compared to GPT-5.1 and Gemini 3 inside an identical RAG pipeline.

Here’s what I found:

  • Opus is far more structured than Gemini

  • More coherent than GPT 5.1, which tends to add extra “helpful” details

  • And importantly: it provided the cleanest reasoning across all models tested

RAG systems fail on drifting, over-extraction, and messy reasoning and Opus seemed to handle those cases more reliably than the others.

Would love to discuss if anyone also saw same patterns.

1 Like