Am I using Embeddings wrong or is it the wrong approach

Hello everyone

My problem is:
I have embeddings for the following data, per line

Core 3 100U
Core 3 100U (with IPU)
Core 5 120U
Core 7 150U
Core i9-14900KS
Core i9-14900
... etc

When I query using 14900 for example I get great similarity readings, but if I pass in a long text with the same information inside “14900” it gets it completely wrong.

Is there something wrong with my understanding of how embeddings work. Do I have to split the query somehow to make it work for such a case.

query sample:
Windows 11 Home 64-bitIntel Core i7-1260P Prozessor Dodeca-Core 2,10

This is what I get as an result:

 {
    "similarity": 0.8905724048320375,
    "cpu": "Core i5-5287U"
  },
  {
    "similarity": 0.8909236816567425,
    "cpu": "Core i5-1135G7 (with IPU)"
  }

I am using Supabase/gte-small with transformer.js and the cosine similarity from ml-distance.

Would be great if anyone could help me with this or direct me in the right direction.

thanks a lot

Well found the solution in the end.

I just use the FlagEmbedding library from python with a given example from a huggingface model.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.