Hi, the project that I am working on has a lot of domain-specific vocabulary. Could you please suggest techniques for tuning BERT on domain data? I do have over 1 million unlabeled sentences. Hoping that should be enough to pre-train the language model. My end goal is to train a multi-class classi…

How to deal with of new vocabulary?

rubenk November 3, 2021, 9:53am 2

I’d also be very interested to see if/how this could be done for BART’s encoder since this might be a solution to this problem

Topic		Replies	Views
Fine-tuning BERT Model on domain specific language Models	1	1798	January 5, 2021
Training BERT model from scratch with custom sequence Beginners	0	392	September 21, 2022
Using custom embeddings for pre-training model for new vocabulary Beginners	0	205	December 25, 2023
Fine-tune model for domain or create language model from scratch Beginners	0	656	May 2, 2022
Creating word embeddings using BERT of machine generated sequential data Models	0	265	April 7, 2023