Am I using Embeddings wrong or is it the wrong approach

Virionz · April 26, 2024, 8:25pm

Hello everyone

My problem is:
I have embeddings for the following data, per line

Core 3 100U
Core 3 100U (with IPU)
Core 5 120U
Core 7 150U
Core i9-14900KS
Core i9-14900
... etc

When I query using 14900 for example I get great similarity readings, but if I pass in a long text with the same information inside “14900” it gets it completely wrong.

Is there something wrong with my understanding of how embeddings work. Do I have to split the query somehow to make it work for such a case.

query sample:
Windows 11 Home 64-bitIntel Core i7-1260P Prozessor Dodeca-Core 2,10

This is what I get as an result:

 {
    "similarity": 0.8905724048320375,
    "cpu": "Core i5-5287U"
  },
  {
    "similarity": 0.8909236816567425,
    "cpu": "Core i5-1135G7 (with IPU)"
  }

I am using Supabase/gte-small with transformer.js and the cosine similarity from ml-distance.

Would be great if anyone could help me with this or direct me in the right direction.

thanks a lot

Virionz · April 26, 2024, 11:43pm

Well found the solution in the end.

I just use the FlagEmbedding library from python with a given example from a huggingface model.

system · April 27, 2024, 11:43am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Vector search returns almost random results Models	3	487	February 10, 2024
Can one get embeddings from an inference API that computes Sentence Similarity (in 2023)? Inference Endpoints on the Hub	0	418	June 3, 2023
Calling Inference API for text embedding Inference Endpoints on the Hub	1	1871	August 4, 2023
How to use embeddings to compute similarity? Beginners	4	4445	January 27, 2022
Computing similarity between sentences Intermediate	4	3284	July 31, 2021

Am I using Embeddings wrong or is it the wrong approach

Related topics