Fine-tuned MLM based RoBERTa not improving performance

sachinkalsi · April 18, 2023, 4:27am

We have lots of domain-specific data (200M+ data points, each document having ~100 to ~500 words). We wanted to have a domain-specific LM.

We took some sample data points (2M+) & fine-tuned RoBERTa-base using the Mask Language Modelling (MLM) task.

So far

we did 4-5 epochs (512 sequence length, batch-size=48)
used cosine learning rate scheduler (2-3 cycles/epochs)
We used dynamin masking (masked 15% tokens)

Since the RoBERTa model is finetuned on domain-specific data, we do expect this model to perform better than the pre-trained-RoBERTa which is trained on general texts (wiki data, books, etc)

We did perform some tasks like Named Entity Recognition (NER), Text Classification, and Embedding generation to perform cosine similarity tasks. We did this on both finetuned domain-specific RoBERTa and pre-trained-RoBERTa.

Surprisingly, the results are the same (very small difference) for both models. We did try Spacy models too, but the results are same.

Perplexity scores indicate that finetuned MLM-based RoBERTa has a minimal loss.

Can anyone please help us understand why MLM based model is NOT performing better?

should we go for more data OR more epochs OR both, to see some effect?
are we doing anything wrong here? Let me know if any required details are missing. I will update

any suggestions OR any valuable links addressing these concerns would be really helpful

roma999 · April 20, 2023, 5:07am

I’m not sure why they perform the same, but maybe by looking at the FP samples for both models in the test set you might see a noticeable trade-off between the generalization and overfitting.

sachinkalsi · April 20, 2023, 4:17pm

@phosseini: Could you offer some assistance here, please? Do you have any ideas or suggestions?

Topic		Replies	Views
Domain adaptation for embeddings - fine tuning on MLM Beginners	2	485	July 12, 2024
Incremental training on unlabeled data using MLM 🤗Transformers	0	633	December 10, 2022
RoBERTa MLM fine-tuning Beginners	1	1873	November 24, 2021
xlm-Roberta for mlm doesn't predict single one trained sentence properly Models	0	218	June 29, 2023
I'm making ROBERTA dumber, and I don't know why Beginners	1	341	March 8, 2021

Fine-tuned MLM based RoBERTa not improving performance

Related topics