Incremental training on unlabeled data using MLM

I have an unlabeled data dump of 2 million sentences.

I have fine-tuned a roberta-base model with masked language modeling on the first 100k sentences using the implementation described by Lewis Tunstall in this notebook.

Now I want to try to fine-tune the roberta-base model on 250k sentences to compare the effect of training on larger data on a downstream, binary classification task but I have limited compute (Google colab).

I want to know whether the below two approaches result in the same model or not:

  1. fine-tuned a roberta-base model with masked language modeling on the first 250k sentences from scratch

  2. fine-tuned the model already trained on 100k sentences on the next 150k sentences with masked language modeling

In theory, I believe both should give me the exact same final model. But I want to verify.
Second approach would save me half the compute needed for the first.