Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering

I have two (very basic) questions:

  1. I suppose in the tutorial the entire model is being fine-tuned at once. Is there an easy way to first train only the classification head and only then unfreeze the entire model?
  2. Is the classification head in BertForSequenceClassification pre-trained or initialized randomly on top of BertModel? If pre-trained, which task/dataset has been used for pre-training?

Note: I’ve been using BERT instead of DistilBERT, but I guess the same applies to both.