Further Pretrain Basic BERT for sequence classification

TonyGong97 · October 9, 2020, 12:30am

Has anyone successfully used Hugging Face to further pretrain Baisc BERT model and apply it to sequence classification task ? I tried to find any relevant posts/documents on Hugging Face website but failed to find any useful information. The usual way to further pretrain BERT is to use original google BERT implementation. I want to stick with Huggingface and see if there is a way to work around with this further pretraining task. If anyone could offer some help or guidances I sincerely appreciate it.

BramVanroy · October 9, 2020, 9:14am

You seem to be looking for the term finetuning. You should be able to find a lot of information if you search for that term. Here is a good one to get you started: https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb

TonyGong97 · October 9, 2020, 2:35pm

Hi Bram. Thank you very much for your kind reply. After seeing your reply I rethink of the so called further pre-training that I want to do. I realize that I can actually just directly train, for example, BertForSequenceClassification model and by fitting the model to my data I am actually already doing the further pre-training and classification at the same time, because all the model parameters will be updated according to my new data(If I don’t intentionally freeze the weight update for all the transformer layer). Am I understanding this correctly ?

BramVanroy · October 9, 2020, 4:05pm

That is correct but only if you use BertForSequenceClassification.from_pretrained to load the existing weights. You can then use that model to further train the model. If you do not use from_pretrained, you will train a new model from-scratch and that is probably not what you want.

This process is called finetuning the model on a downstream task. Do not call it pretrain, that is confusing. Pretraining is the initial training that was done to get the weights that currently exist in the models that we can use.

TonyGong97 · October 9, 2020, 5:55pm

Understood. Thanks so much Bram.This is of great help to me !

Topic		Replies	Views
Further pre-train language model in transformers like BERT Models	3	1108	March 27, 2022
BERT finetuning "index out of range in self" Intermediate	2	4115	August 24, 2021
Original Bert Pretraining Intermediate	0	546	January 10, 2022
How can I pretrain a new model re-initializing with my own vocab? 🤗Transformers	0	292	May 25, 2021
Doing classification 100% from scratch? 🤗Transformers	4	1717	September 17, 2021

Further Pretrain Basic BERT for sequence classification

Related topics