Perform knowledge distillation using accelerate

allanjie · March 5, 2023, 12:36pm

I’m working on knowledge distillation from a relatively large model 11B to 2B model, for example, under DDP environment.

Usually, we will do the following to prepare the model

model, optimizer = accelerator.prepare(model, optimizer)

However, I’m wondering how we can fit two model in one code base, as we have teacher_model and student_model.
I tried something like

teacher_model, student_model, optimizer = accelerator.prepare(teacher_model, student_model, optimizer)

but it’s not working.

Topic		Replies	Views
Can I call prepare() separately on multiple models or should it be a single call? Beginners	0	197	February 26, 2024
How to accelerate.pepare() two optimizer with different LR for two separate models? 🤗Accelerate	2	920	February 26, 2024
Data Parallel Multi GPU Inference 🤗Accelerate	9	4662	September 15, 2023
How to accelerate.pepare() two different models based on different accelerate configs? 🤗Accelerate	3	1103	November 22, 2022
Distillation: create student model from a different base model than teacher 🤗Transformers	9	2085	October 14, 2020