How to give course advice based on similarity search results (ChromaDB + OpenAI embeddings)?

EroStefano · October 30, 2025, 10:30pm

Hi everyone,

I have about a hundred documents describing different courses, which I’ve transformed into embeddings. Based on a user’s input, I want to recommend what they could study next.
Example: “I want to study AI.”

I’m using ChromaDB for similarity search together with OpenAI’s text-embedding-3-large model. ChromaDB returns cosine distances in the range of 0 to 2. In my case, the best match has a distance of around 0.8, which seems relatively high — but it’s still the best among the available options.

The distance is high (and “bad”) because the user input is much shorter than the course descriptions. This ratio will always exist and may even increase over time. In my opinion, this high distance doesn’t matter much — all distances might look bad, but they still show which courses fit best. It’s still the best recommendation.

Since the embeddings come directly from the OpenAI model, there’s not much I can change about how they’re generated, I guess.

At the moment, my plan is simply to list the courses sorted by distance, regardless of how “bad” the distances are. This feels almost too simple.

Is there something else I could be doing to improve the quality of these recommendations? I don’t want to apply a hard threshold, since even larger distances can still be useful.

Any advice or best practices for handling this kind of situation would be greatly appreciated.

John6666 · October 31, 2025, 1:43am

For now, I’ll put existing approaches here.

EroStefano · October 31, 2025, 9:34am

Woow! Thank you very much! I will read it very carefully!

Topic		Replies	Views
Sentence similarity Beginners	1	958	September 16, 2021
Text similarity not by cosine similarity Research	3	4914	April 12, 2022
Which model to use for suggesting article to the user based on details provided? Beginners	7	1867	May 28, 2021
How to find closest embedding vectors? Intermediate	2	1767	July 26, 2022
Interesting (but puzzling) cosine-similarity comparison with distilbert 🤗Transformers	0	477	August 6, 2021

How to give course advice based on similarity search results (ChromaDB + OpenAI embeddings)?

Related topics