Seeking Advice on Processing Support Conversations for Efficient RAG Model Search

Hi everyone,

I’m working on a project where I need to process thousands of support conversations and integrate them into a Retrieval-Augmented Generation (RAG) model for efficient searching. My goal is to make it easier to find relevant information from these conversations, but I’m struggling with the best way to structure the data.

Currently, I’m planning to use a sentence transformer to generate summaries of each conversation. I also want to extract key elements like the problem and solution from the conversation. However, I’m unsure how to combine these components in a way that maximizes the model’s effectiveness.

My questions are:

  • What’s the best way to organize these elements for the RAG model?** Should I structure the data as summary → problem → solution → raw conversation, or is there a more effective approach that would improve search efficiency?
  • Are there alternative methods or tools besides sentence transformers that could be effective in processing this data?** I’m open to any suggestions on how to use this data more efficiently or any best practices for structuring text for RAG models.

If anyone has experience with processing text for RAG models or any suggestions on how to structure this information, I’d really appreciate your insights!

Thanks in advance for your help!

P/S: I’m new to the forum, so if this post doesn’t fit in this category, please let me know.