Training a SentenceTransformers for address simliarity

hey all,

I’m looking to fine-tune a sentence transformer model to do address similarity. I have a dataset model as such chiragshahcompass/addy · Datasets at Hugging Face. I’m using a MultipleNegativesRankingLoss loss function. My ultimate goal is to batch do address similarity so I don’t need to fumble around with Levenstein/ other NLP-type fuzzy matching processes. Thoughts on if this is the best approach, are their other options to consider? Thanks in advance!

Have you tried sentence transformer with kmeans and cosine similarity clusters for matching?

Thanks for the reply @deadbod-81. I have tried it out of the box, and it seems to work fine. I just figured a fine-tuned model made more sense as I have a bunch of non-sensical address similarities that normal similar clusters probably wouldn’t get right.

Hi. Did you have any joy with this? What model did you use as a baseline and did the fine-tuning help in the end?