Fine Tune BERT Models

Hugging Face Transformers: Fine-tuning DistilBERT for Binary Classification Tasks A Beginner’s Guide to NLP and Transfer Learning in TF 2.0

After (optionally) modifying DistilBERT’s configuration class, we can pass both the model name and configuration object to the .from_pretrained() method of the TFDistilBertModel class to instantiate the base DistilBERT model without any specific head on top (as opposed to other classes such as TFDistilBertForSequenceClassification that do have an added classification head). We do not want any task-specific head attached because we simply want the pre-trained weights of the base model to provide a general understanding of the English language, and it will be our job to add our own classification head during the fine-tuning process

Because DistilBERT’s pre-trained weights will serve as the basis for our model, we wish to conserve and prevent them from updating during the initial stages of training when our model is beginning to learn reasonable weights for our added classification layers. To temporarily freeze DistilBERT’s pre-trained weights, set layer.trainable = False for each of DistilBERT’s layers, and we can later unfreeze them by setting layer.trainable = True once model performance converges.

from transformers import TFDistilBertModel, DistilBertConfig
DISTILBERT_DROPOUT = 0.2
DISTILBERT_ATT_DROPOUT = 0.2
 
# Configure DistilBERT's initialization
config = DistilBertConfig(dropout=DISTILBERT_DROPOUT, 
                          attention_dropout=DISTILBERT_ATT_DROPOUT, 
                          output_hidden_states=True)
                          
# The bare, pre-trained DistilBERT transformer model outputting raw hidden-states 
# and without any specific head on top.
distilBERT = TFDistilBertModel.from_pretrained('distilbert-base-uncased', config=config)

# Make DistilBERT layers untrainable
for layer in distilBERT.layers:
    layer.trainable = False
2 Likes