Transfer learning is using someone else’s trained model as your model’s initial weights. Training from scratch is using random numbers for your model’s initial weights.
Training from scratch requires a lot of data and a lot of resources.
You might need to train from scratch if your data is completely different from the standard data. For example, if it is in a different language, or chemical symbols. Otherwise, it will probably be better to use transfer learning, starting from the closest kind of data you can find.
A lot of people do Intermediate training. That is where you use your data, but not your final downstream task.
For example, you might choose to start with a pre-trained BERT, such as bert-base-uncased. Then you might do Masked Language Modelling using your text data. Finally, you might do Sequence Classification training, using your text and your labels.
If your text is quite similar to the BERT corpus (wikipedia plus books), then you could probably get results by unfreezing only half the BERT layers. If your text is very different, you might get better results if you unfreeze all the layers.
If your text is very similar to the BERT corpus, you might not need to do intermediate training, and you might not need to unfreeze any layers. If the results from using pre-trained BERT with your downstream task are “good enough”, then stop there.
The more you unfreeze, the longer the training will take.
I don’t know whether you should freeze the same layers for your downstream task training as for your intermediate training. Maybe you could try freezing half the layers for intermediate training, but freezing all the layers for your downstream task training.