Fine Tune BERT Models

monta · June 25, 2021, 9:47am

Hugging Face Transformers: Fine-tuning DistilBERT for Binary Classification Tasks A Beginner’s Guide to NLP and Transfer Learning in TF 2.0

After (optionally) modifying DistilBERT’s configuration class, we can pass both the model name and configuration object to the .from_pretrained() method of the TFDistilBertModel class to instantiate the base DistilBERT model without any specific head on top (as opposed to other classes such as TFDistilBertForSequenceClassification that do have an added classification head). We do not want any task-specific head attached because we simply want the pre-trained weights of the base model to provide a general understanding of the English language, and it will be our job to add our own classification head during the fine-tuning process

Because DistilBERT’s pre-trained weights will serve as the basis for our model, we wish to conserve and prevent them from updating during the initial stages of training when our model is beginning to learn reasonable weights for our added classification layers. To temporarily freeze DistilBERT’s pre-trained weights, set layer.trainable = False for each of DistilBERT’s layers, and we can later unfreeze them by setting layer.trainable = True once model performance converges.

from transformers import TFDistilBertModel, DistilBertConfig
DISTILBERT_DROPOUT = 0.2
DISTILBERT_ATT_DROPOUT = 0.2
 
# Configure DistilBERT's initialization
config = DistilBertConfig(dropout=DISTILBERT_DROPOUT, 
                          attention_dropout=DISTILBERT_ATT_DROPOUT, 
                          output_hidden_states=True)
                          
# The bare, pre-trained DistilBERT transformer model outputting raw hidden-states 
# and without any specific head on top.
distilBERT = TFDistilBertModel.from_pretrained('distilbert-base-uncased', config=config)

# Make DistilBERT layers untrainable
for layer in distilBERT.layers:
    layer.trainable = False

Topic		Replies	Views
Finetuning Bert to adapt to the newly added class 🤗Transformers	0	81	June 22, 2024
Unfreeze BERT vs pre-train BERT for Sentiment Analysis Beginners	2	1306	December 24, 2021
How do i take only "BERT" weights from BertForSequenceClassification model? 🤗Transformers	0	1443	February 16, 2022
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1329	September 8, 2021
Custom Tasks and BERT Fine Tuning Beginners	4	4997	October 30, 2020

Fine Tune BERT Models

Related topics