I have two (very basic) questions:
- I suppose in the tutorial the entire model is being fine-tuned at once. Is there an easy way to first train only the classification head and only then unfreeze the entire model?
- Is the classification head in
BertForSequenceClassificationpre-trained or initialized randomly on top ofBertModel? If pre-trained, which task/dataset has been used for pre-training?
Note: I’ve been using BERT instead of DistilBERT, but I guess the same applies to both.