Has anyone successfully used Hugging Face to further pretrain Baisc BERT model and apply it to sequence classification task ? I tried to find any relevant posts/documents on Hugging Face website but failed to find any useful information. The usual way to further pretrain BERT is to use original google BERT implementation. I want to stick with Huggingface and see if there is a way to work around with this further pretraining task. If anyone could offer some help or guidances I sincerely appreciate it.
You seem to be looking for the term finetuning. You should be able to find a lot of information if you search for that term. Here is a good one to get you started: https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb
Hi Bram. Thank you very much for your kind reply. After seeing your reply I rethink of the so called further pre-training that I want to do. I realize that I can actually just directly train, for example, BertForSequenceClassification model and by fitting the model to my data I am actually already doing the further pre-training and classification at the same time, because all the model parameters will be updated according to my new data(If I don’t intentionally freeze the weight update for all the transformer layer). Am I understanding this correctly ?
That is correct but only if you use
BertForSequenceClassification.from_pretrained to load the existing weights. You can then use that model to further train the model. If you do not use
from_pretrained, you will train a new model from-scratch and that is probably not what you want.
This process is called finetuning the model on a downstream task. Do not call it pretrain, that is confusing. Pretraining is the initial training that was done to get the weights that currently exist in the models that we can use.
Understood. Thanks so much Bram.This is of great help to me !