Perform knowledge distillation using accelerate

I’m working on knowledge distillation from a relatively large model 11B to 2B model, for example, under DDP environment.

Usually, we will do the following to prepare the model

model, optimizer = accelerator.prepare(model, optimizer)

However, I’m wondering how we can fit two model in one code base, as we have teacher_model and student_model.
I tried something like

teacher_model, student_model, optimizer = accelerator.prepare(teacher_model, student_model, optimizer)

but it’s not working.

1 Like