Hi @Weilin,
Thank you for the response. I am wondering if you would be able to expand on your suggestions or point me to some resources that would help.
I do agree that I have noticed that sentence vectors do act as a fancy regex function, but I feel like it has potential for semantic similarity! Still, maybe good sentence vectors for semantic similarity aren’t a thing just yet.
Regarding your suggestions.
A pretrained paraphrase task may be better than similarity task
Do you mean it would be better to train the model on a paraphrasing task? Or do you mean that the end application should use paraphrasing and not similarity?
Going back to simple vectors (like fastText) and doing your search query on those embedded terms (but this really will only benefit literature search rather than notes due to corpus size)
As far as I know, fastText does not work well on word phrases (sentences) so this approach would have to embed keywords from the search query as well as embedding the notes in a similar fashion. Am I understanding you correctly?
Knowledge graph creation with embeddings.
This I don’t know much about, but if you have good resources on it, I’d be interested in learning more about it.
Simple training exercise
Could you elaborate?