Sentence similarity - how to train it dynamically

cherry2234 · September 18, 2023, 9:50am

I am new to machine learning, I want to implement a task which is: map(many to one if not one to one) the words in two columns of words with each other based on word similarity. So with my limited but extensive research the two best ways i found are:

Use sentence-similarity model(pretrained) and cosine similarity to match the most similar words in both the columns based on similarity scores, fine tune it so that the words which we want would have the highest similarity
Use classification model and train it such that similar words are predicted into the same label.

But the problem with method 1 is:
*It is not learning fast enough, when I put two columns of words and if I get the accuracy of all the matchings to be 80% , when I train with the data, after training it is not giving me 100% , I understand the pre trained model cannot change parameters drastically for less data, but how to tackle this problem if I want 100% accuracy if I see the data for the second time?

The problem with method 2 is:
*the number of labels should be dynamically increasing according to the new columns of data encountered at the run-time, and since classification has a fixed number of labels, can we increase labels accurately without manually intervening at the run-time, based on the data it works with? Like for example, if the model sees a completely new map which it has never seen and does not remotely belong to any labels, it should create a new label and put it into that…

*Please feel free to suggest any solution, any other implementation or model idea for the above mentioned task, thanks a looottt!!

Topic		Replies	Views
Fine tuning a sentence-transformer for cosine sim on 500k sentence pairs without labels-- advice 🤗Transformers	2	1202	April 20, 2024
Text to text classification Intermediate	0	517	March 12, 2022
How can I finetune an embedding model with a multi label dataset for similarity comprison? Beginners	0	24	September 13, 2024
Sentence similarity Beginners	1	946	September 16, 2021
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1331	September 8, 2021

Sentence similarity - how to train it dynamically

Related topics