I’m working on an internal, confidential document-retrieval system where no data can be sent to external LLM APIs (OpenAI, Anthropic, etc.). All processing must be self-hosted using open-source models.
How do I build a RAG pipeline that can retrieve both text(this is not a problem already done by my colleagues) and figures/images from documents (PDFs primarily), because in our domain the figures are often as important as the text (architecture diagrams, charts, schemas). I’m not asking for any fancy image generation, just retrieving the existing image along with relevant text context.
I’m trying to understand what the correct end-to-end pipeline should look like, especially from people who’ve done this in practice. Can you please guide me in this?