DistilBert weights initialization

meisyarahd · August 19, 2020, 2:22pm

I want to train a DistilBertModel from scratch with my own corpus, using BertModel as the teacher model. Following DistilBert paper, what’s the best way to initialize the weights of my DistilBert with part of the teacher model’s weights?

It seems both models are constructed using different classes (e.g. BertAttention in BertModel and MultiheadAttention in `DistilBertModel). In this case, I don’t know if I can just “assign” the teacher’s layers to the DistilBert’s layers…

valhalla · August 20, 2020, 3:16pm

Hi @meisyarahd, you can find the distillation example here

Topic		Replies	Views
Do we use pre-trained weights in Trainer? Beginners	2	430	January 7, 2022
Does it make sense to train DistilBERT from scratch in a new corpus Beginners	14	6622	April 4, 2023
Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed 🤗Transformers	2	7919	October 6, 2020
Initializing modelingBert as an identity transformation Intermediate	0	642	December 22, 2021
Weights of pre-trained BERT model not initialized 🤗Transformers	2	2075	March 11, 2021

DistilBert weights initialization

Related topics