I am using mixedbread-ai/deepset-mxbai-embed-de-large-v1 embeddings for semantic search in a niche domain. The embeddings work well, but I would like to make them more domain-specific. Annotated data is very hard to generate, but I have a large corpus (250 MB of raw text) of domain-specific documen…

Pretraining or Finetuning

John6666 October 6, 2024, 10:51am 2

I’m new to language modeling so I only know what I’ve seen and heard, but I thought it might be what I’ve heard so often called transfer learning.
I thought that layer freezing was essential to prevent forgetting, but according to the following post, apparently not so much?
If it’s a RoBERTa model, the original author is on HF, so you could send him a direct mentions and ask him how to train it. You can reach him from here too. (@+username)

Topic		Replies	Views
The point of using pretrained model if I don't freeze layers Beginners	1	8550	May 31, 2023
Continue Pre-Training Roberta Intermediate	3	2695	May 18, 2023
Training embeddings of tokens 🤗Transformers	2	5214	January 27, 2021
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1331	September 8, 2021
Fine Tune BERT Models Beginners	5	16620	June 25, 2021

Pretraining or Finetuning

Related topics