I’m working on knowledge distillation from a relatively large model 11B to 2B model, for example, under DDP environment.
Usually, we will do the following to prepare
the model
model, optimizer = accelerator.prepare(model, optimizer)
However, I’m wondering how we can fit two model in one code base, as we have teacher_model
and student_model
.
I tried something like
teacher_model, student_model, optimizer = accelerator.prepare(teacher_model, student_model, optimizer)
but it’s not working.