Hello everyone
My problem is:
I have embeddings for the following data, per line
Core 3 100U
Core 3 100U (with IPU)
Core 5 120U
Core 7 150U
Core i9-14900KS
Core i9-14900
... etc
When I query using 14900 for example I get great similarity readings, but if I pass in a long text with the same information inside “14900” it gets it completely wrong.
Is there something wrong with my understanding of how embeddings work. Do I have to split the query somehow to make it work for such a case.
query sample:
Windows 11 Home 64-bitIntel Core i7-1260P Prozessor Dodeca-Core 2,10
This is what I get as an result:
{
"similarity": 0.8905724048320375,
"cpu": "Core i5-5287U"
},
{
"similarity": 0.8909236816567425,
"cpu": "Core i5-1135G7 (with IPU)"
}
I am using Supabase/gte-small with transformer.js and the cosine similarity from ml-distance.
Would be great if anyone could help me with this or direct me in the right direction.
thanks a lot