Codebase Embedding

dksensei · January 23, 2025, 3:19pm

Hi, I want to build a RAG chatbot based on a large local codebase of mine. Does anyone know what the best method would be to generate embedding vectors for the codebase?

I am using DeepSeek-r1 for chat currently. For embeddings, I have no idea which model will handle this large codebase. Also, I have no idea how to feed the model with the directory structure and how files are interlinked.

Topic		Replies	Views
What is the best approach to let LLM to learn company internal legacy system Intermediate	6	319	April 8, 2025
Which chunker to utilize for code based data Intermediate	1	216	March 12, 2025
Use embeddings stored in vector db to reduce work for LLM generating response Intermediate	0	1579	February 19, 2024
Pre-trained embedding model on API specification files for RAG use case Beginners	2	54	June 24, 2025
How to improve Code Search with CodeBERT & ChromaDB Research	1	220	March 11, 2025

Codebase Embedding

Related topics