What is the best way to finetune a hugging face model on MLM task on my own dataset which is not that much (10K sentences) with regards to the freezing and unfreezing of layers.
Lets say the hugging face model has 12 encoders and 12 decoders and I want to train it on my own dataset using MLM technique. Will it better if I train only 1 encoder and 1 decoder so that my dataset’s information gets incorporated into the model and also the model does not forget what it previously learned or do I train all the encoders and decoders again for the retraining purposes?
It would be very helpful if you can give some intuition into this.