Hello
Has anyone experience with distilling LongT5?
I am trying to reduce the size of the model down to ~30M parameters so I can perform summarization tasks with large documents on a resources constrained environment, but I am not entirely sure how to proceed to simplify this specific model, using the transformers Python framework.
I am planning to reduce some layers and mimic what I see is done with regular T5, but I don’t really know what I am doing, there’s not a lot of info available on longt5 for this.
Any guidance would be appreciated
Thanks !
Tarek