Training for sentence vectors in niche domain

Jung · August 14, 2020, 1:52am

Hi @nbroad , interesting question.
What kind of Niche domain do you consider?
Since nowadays we have several hundreds (if not thousands) of NLP datasets, is it possible to find similar datasets for pre-MLM before final-MLM by your own data ?

I don’t have direct experience on sentence similarity training, but I once trained a classifier on multi-langauges Toxic-comment domain (maybe a bit niche) where finetuning with MLM did improve the performance compared to non-MLM.

Topic		Replies	Views
Training BERT for word embedding Beginners	17	14333	November 12, 2022
What are some recommended pretrained models for extracting semantic feature on single sentence? Research	4	1478	December 14, 2020
Using MLM and NSP to fine-tune BERT for question answering Models	0	1168	October 11, 2022
Fine-tuning a language model on domain specific embeddings 🤗Transformers	1	1122	November 21, 2023
Domain adaptation with MLM and NSP 🤗Transformers	3	1715	January 18, 2024

Training for sentence vectors in niche domain

Related topics