Why does RAG still feel clunky in 2025?

Let’s just be honest for a second.
Everyone’s saying “RAG is the future.”
But… have you really tried building one that doesn’t fall apart on contact?


Most of what we call “RAG” today is still a fragile dance of glue code and faith:

  • One bad chunk split? Bye-bye relevance.
  • Vector DB latency? Now your agent sounds drunker than me.
  • Grounded answers? Sure, until someone asks “why” twice.

And if you’ve ever tried to scale this beyond a toy demo, you’ve probably hit one of these walls:

  1. Semantic mismatch – the model sounds fluent but isn’t actually reading the context right.
  2. Retriever overconfidence – grabbing something that feels close but is totally off.
  3. Unnatural prompt stitching – stuffing retrieved docs into the prompt like it’s a sandwich nobody ordered.

All of this gets worse when people assume “just add more tokens” will fix things.
Spoiler: it doesn’t. It just makes the model pretend better.


There’s also an elephant in the room:
The current generation of LLMs was never built with retrieval in mind.
We’re still trying to retrofit memory into an architecture that was trained to forget.


So… yeah. RAG sounds great. In practice, it’s still rough.
Maybe we should talk more openly about that.

Curious how others are navigating this.
Has anyone found setups that actually feel smooth and scalable?

2 Likes

Yep — totally agree with this framing.

At some point, I started realizing that even when retrieval is logically sound, the generation still slips. It’s like you’re building on factual memory, but the semantic scaffolding isn’t quite there — so the output ends up coherent on the surface, but not structurally grounded.

We’ve been playing with different ways to observe these tension points — especially when answers feel “right” but originate from a semantically shifted zone. Still just scratching the surface, but I love seeing others digging into the root architecture too, not just the patches.

Really appreciate this thread — super clarifying.

1 Like

Hello @Pimpcat-AU Im interested with you approach!, can you share a fragment of code in python making your indexed lookup???

I was about to publish a topic for this trend :sweat_smile:

Because im working in a little project thats make a search in a folder with some pdf files, and I have mi rag but I need to make more accurate mi lookup

2 Likes

import os
from collections import defaultdict

Build inverted index from folder of text files

def build_index(folder_path):
index = defaultdict(set) # word → set of filenames
for filename in os.listdir(folder_path):
if not filename.endswith(‘.txt’):
continue
filepath = os.path.join(folder_path, filename)
with open(filepath, ‘r’, encoding=‘utf-8’) as f:
text = f.read().lower()
words = set(text.split())
for word in words:
index[word].add(filename)
return index

Search query in index

def search_index(index, query):
query_words = query.lower().split()
results = None
for word in query_words:
if word in index:
if results is None:
results = index[word].copy()
else:
results &= index[word]
else:
return set() # word not found
return results or set()

Example usage

folder = ‘/path/to/text/files’
index = build_index(folder)

query = “your search terms here”
matched_files = search_index(index, query)

print(f"Files matching ‘{query}’:")
for f in matched_files:
print(f)

6 Likes

I forgot to mention chunking the data also speeds up indexing speed.

3 Likes

Thank you @Pimpcat-AU ! I will try your code for understand it, thank youuu :ok_hand:

3 Likes

Take a look here:
K3D - The new paradigm for Knowledge.

Take a look at my post, I think you’ll find it interesting…

1 Like

Can you link it please?

2 Likes

Sure, if my post is allowed:
K3D

1 Like

Hey, I took a look through your K3D/3D Knowledge repo. You’ve clearly put in a lot of work pulling together ideas from AI, 3D vector data, and spatial web tech. The range of research is solid, and your documentation covers a lot of ground. You’re tackling some real challenges in how AI can work with spatial and vector knowledge, especially with agents and immersive systems.

My main thought is that your work is at the cutting edge of what people are trying to do with 3D knowledge representation. It’s early days for this field, so the conceptual work and roadmaps make sense. I noticed things are still mostly in the research and planning stage, but that’s how these big shifts always start.

Just to share, I’m currently working on an image-based memory architecture that will be deployed in either my 3rd or 4th generation bots. I’m still in the process of finishing up my Gen 2 bots, so I haven’t had the time to fully complete and implement the new memory system yet. I know how to do it, there just isn’t enough hours in the day.

Overall, this is a good foundation. I recommend keeping your architecture modular so you can adapt if something better comes along. If you want to discuss deterministic storage or alternative memory systems, I’m happy to talk more.

Keep going, you’re on the right track.

4 Likes

I succeeded :slight_smile: oh wow, didn’t realize it took me 2 months… geez the days are a blur.

1 Like