Incremental training on unlabeled data using MLM

bitsanlp · December 10, 2022, 2:07pm

I have an unlabeled data dump of 2 million sentences.

I have fine-tuned a roberta-base model with masked language modeling on the first 100k sentences using the implementation described by Lewis Tunstall in this notebook.

Now I want to try to fine-tune the roberta-base model on 250k sentences to compare the effect of training on larger data on a downstream, binary classification task but I have limited compute (Google colab).

I want to know whether the below two approaches result in the same model or not:

fine-tuned a roberta-base model with masked language modeling on the first 250k sentences from scratch
fine-tuned the model already trained on 100k sentences on the next 150k sentences with masked language modeling

In theory, I believe both should give me the exact same final model. But I want to verify.
Second approach would save me half the compute needed for the first.

Topic		Replies	Views
Request: Mask-LM Training Google Colab Beginners	2	428	November 20, 2020
Fine tune Masked Language Model on custom dataset Beginners	5	6100	August 20, 2020
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3044	January 25, 2021
How to train the Translation Language Modeling (TLM) with transformers/examples/language-modeling/run_mlm.py? 🤗Transformers	2	968	June 26, 2021
BertForMaskedLM on a fine-tuned base model Beginners	2	794	September 7, 2020

Incremental training on unlabeled data using MLM

Related topics