Training for sentence vectors in niche domain

nbroad · August 19, 2020, 4:08pm

Thank you for the response. I am wondering if you would be able to expand on your suggestions or point me to some resources that would help.

I do agree that I have noticed that sentence vectors do act as a fancy regex function, but I feel like it has potential for semantic similarity! Still, maybe good sentence vectors for semantic similarity aren’t a thing just yet.

Regarding your suggestions.

A pretrained paraphrase task may be better than similarity task

Do you mean it would be better to train the model on a paraphrasing task? Or do you mean that the end application should use paraphrasing and not similarity?

Going back to simple vectors (like fastText) and doing your search query on those embedded terms (but this really will only benefit literature search rather than notes due to corpus size)

As far as I know, fastText does not work well on word phrases (sentences) so this approach would have to embed keywords from the search query as well as embedding the notes in a similar fashion. Am I understanding you correctly?

Knowledge graph creation with embeddings.

This I don’t know much about, but if you have good resources on it, I’d be interested in learning more about it.

Simple training exercise

Could you elaborate?

Topic		Replies	Views
Training BERT for word embedding Beginners	17	14341	November 12, 2022
What are some recommended pretrained models for extracting semantic feature on single sentence? Research	4	1480	December 14, 2020
Using MLM and NSP to fine-tune BERT for question answering Models	0	1168	October 11, 2022
Fine-tuning a language model on domain specific embeddings 🤗Transformers	1	1124	November 21, 2023
Domain adaptation with MLM and NSP 🤗Transformers	3	1716	January 18, 2024

Training for sentence vectors in niche domain

Related topics