Distillation for LongT5

Hello
Has anyone experience with distilling LongT5?
I am trying to reduce the size of the model down to ~30M parameters so I can perform summarization tasks with large documents on a resources constrained environment, but I am not entirely sure how to proceed to simplify this specific model, using the transformers Python framework.

I am planning to reduce some layers and mimic what I see is done with regular T5, but I don’t really know what I am doing, there’s not a lot of info available on longt5 for this.

Any guidance would be appreciated

Thanks !
Tarek