I’m trying to read up on knowledge distillation and as an exercise, I’d like to fine-tune a GPT2-medium model on a specific generation task and then distill it down to a small GPT2 model. Could someone point me towards a colab or tutorial that I could use to learn hands-on how to do this? Thanks