Pre-trained embedding model on API specification files for RAG use case

vradhik · June 23, 2025, 3:49pm

Hello, I have a set of files containing API specifications. Some of them are in json/yaml using swagger or openAPI specs, most of them are in either markdown or multi sheet excel file.
The markdown and excel files dont follow a standard template but contain API spefications info like the path, method, request, response, return code, ..etc
For technical document like this, what would be a good pre-trained embedding model to use to cater to returning precise results for API catalog search/chat/find-APIs-for-this-userstory kind of RAG use case.
Let me know if you need further details, any recommendation or direction in this regard is appreciated.

John6666 · June 24, 2025, 3:04am

For RAG applications, I think the first step is to find an embedding model with good retrieval performance from the MTEB leaderboard and try it out. For that type of document, it might be better to use a straightforward, high-performance model rather than a specialized model. If multilingual support is required, it might be a good idea to use a larger model.

Yesseniar · June 24, 2025, 3:34am

I’ve worked on a similar project where we had to deal with inconsistent API documentation formats like markdown and Excel sheets. Using a model like OpenAI’s embeddings combined with some custom preprocessing to normalize the data worked well. Also, for quick brainstorming or even casual coding breaks, I sometimes use Omegle to chat with strangers about tech topics—it’s surprisingly helpful to get fresh perspectives on tricky problems!

Topic		Replies	Views
Uploading a locally saved embedding model Beginners	0	43	July 26, 2024
How to Use HuggingFace free Embedding models Beginners	3	5333	October 7, 2024
How to read/use LLM model templates Beginners	3	131	March 25, 2025
What is the best approach to let LLM to learn company internal legacy system Intermediate	6	172	April 8, 2025
Embeddig model information Beginners	6	116	October 20, 2024

Pre-trained embedding model on API specification files for RAG use case

Related topics