I am new to machine learning, I want to implement a task which is: map(many to one if not one to one) the words in two columns of words with each other based on word similarity. So with my limited but extensive research the two best ways i found are:
Use sentence-similarity model(pretrained) and cosine similarity to match the most similar words in both the columns based on similarity scores, fine tune it so that the words which we want would have the highest similarity
Use classification model and train it such that similar words are predicted into the same label.
But the problem with method 1 is:
*It is not learning fast enough, when I put two columns of words and if I get the accuracy of all the matchings to be 80% , when I train with the data, after training it is not giving me 100% , I understand the pre trained model cannot change parameters drastically for less data, but how to tackle this problem if I want 100% accuracy if I see the data for the second time?
The problem with method 2 is:
*the number of labels should be dynamically increasing according to the new columns of data encountered at the run-time, and since classification has a fixed number of labels, can we increase labels accurately without manually intervening at the run-time, based on the data it works with? Like for example, if the model sees a completely new map which it has never seen and does not remotely belong to any labels, it should create a new label and put it into that…
*Please feel free to suggest any solution, any other implementation or model idea for the above mentioned task, thanks a looottt!!