After (optionally) modifying DistilBERT’s configuration class, we can pass both the model name and configuration object to the
.from_pretrained()
method of theTFDistilBertModel
class to instantiate the base DistilBERT model without any specific head on top (as opposed to other classes such asTFDistilBertForSequenceClassification
that do have an added classification head). We do not want any task-specific head attached because we simply want the pre-trained weights of the base model to provide a general understanding of the English language, and it will be our job to add our own classification head during the fine-tuning processBecause DistilBERT’s pre-trained weights will serve as the basis for our model, we wish to conserve and prevent them from updating during the initial stages of training when our model is beginning to learn reasonable weights for our added classification layers. To temporarily freeze DistilBERT’s pre-trained weights, set
layer.trainable = False
for each of DistilBERT’s layers, and we can later unfreeze them by settinglayer.trainable = True
once model performance converges.
from transformers import TFDistilBertModel, DistilBertConfig
DISTILBERT_DROPOUT = 0.2
DISTILBERT_ATT_DROPOUT = 0.2
# Configure DistilBERT's initialization
config = DistilBertConfig(dropout=DISTILBERT_DROPOUT,
attention_dropout=DISTILBERT_ATT_DROPOUT,
output_hidden_states=True)
# The bare, pre-trained DistilBERT transformer model outputting raw hidden-states
# and without any specific head on top.
distilBERT = TFDistilBertModel.from_pretrained('distilbert-base-uncased', config=config)
# Make DistilBERT layers untrainable
for layer in distilBERT.layers:
layer.trainable = False