curious question to illuminate my understanding.
Fine Tuning a BERT model for you downstream task can be important. So I like to tune the BERT weights. Thus, I can extract them from the BertForSequenceClassification which I can fine tune.
if you fine tune eg. BertForSequenceClassification you tune the weights of the BERT model and the classifier layer too.
But for making right fine tune, you would first need freeze the BERT weights, and tune the classifier. Afterwards you fine tune the BERT weights too, right?
Now, there are myriads of ways to finetune the BERT weights?
If I just use the main BERT model together with arbitrary neural network architecture afterwards I could fine tune the BERT weights in this way too, right?