T5/mT5 model distillation

Hi Everyone,

I am trying to distill my T5 model. I am planning to use this script: transformers/distillation.py at main · huggingface/transformers · GitHub

Can anyone please guide me how to setup my codebase for this process? If anyone have any other better solution please guide me.

1 Like

I am also trying to distill from T5-xxl. What teacher model did you choose to use?