I have data with multiple descriptions of the same product, but they may be worded slightly differently, contain different product ids etc. I would like to group them together Would an approach here be
- Train a classifier on the products to predict say product category
- use the description embeddings to compute cosine similarity. And then
- group by some similarity threshold?
Is there a better approach to this?