I was referring to this code:
From @philschmid
I could follow most of the code, but had few doubts. Please help me to clarify these doubts.
In this code below:
class DistillationTrainer(Trainer):
def __init__(self, *args, teacher_model=None, **kwargs):
super().__init__(*args, **kwargs)
self.teacher = teacher_model
# place teacher on same device as student
self._move_model_to_device(self.teacher,self.model.device)
self.teacher.eval()
When I take fine-tuned teacher model
it is never fine-tuned in the process of Task Specific Distillation training, as in line self.teacher.eval()
mentioned in the code.? Only the output of teacher model
is considered for loss calculations.
I couldn’t follow this line self._move_model_to_device(self.teacher,self.model.device)
. What it is actually doing?
In Task Specific Distillation training, I am fine tuning my student model, but in the DistillationTrainer
I pass both models. Where it’s making sure that only student model weights are learned and not the teacher?
trainer = DistillationTrainer(
student_model,
training_args,
teacher_model=teacher_model,
train_dataset=train_data,
eval_dataset=val_data,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)