Pretraining or Finetuning

I’m new to language modeling so I only know what I’ve seen and heard, but I thought it might be what I’ve heard so often called transfer learning.
I thought that layer freezing was essential to prevent forgetting, but according to the following post, apparently not so much?
If it’s a RoBERTa model, the original author is on HF, so you could send him a direct mentions and ask him how to train it. You can reach him from here too. (@+username)