How to train DeiT distilled model from scratch?


I want to train the DeiTForImageClassificationWithTeacher model from scratch for an image classification task.
I successfully trained the DeiT model without teacher, but the teacher model wants a head_mask instead of labels as an input.
Sadly I did not find any documentation on how to compute this head mask.
Do I need to train a CNN-Teacher model by myself beforehand? I thought a standard teacher would be included in the DeiTForImageClassificationWithTeacher.

Thanks for your help