Distillation for LongT5

tarekziade · January 6, 2024, 9:20am

Hello
Has anyone experience with distilling LongT5?
I am trying to reduce the size of the model down to ~30M parameters so I can perform summarization tasks with large documents on a resources constrained environment, but I am not entirely sure how to proceed to simplify this specific model, using the transformers Python framework.

I am planning to reduce some layers and mimic what I see is done with regular T5, but I don’t really know what I am doing, there’s not a lot of info available on longt5 for this.

Any guidance would be appreciated

Thanks !
Tarek

Topic		Replies	Views
T5/mT5 model distillation 🤗Transformers	1	957	December 25, 2023
Questions on distilling [from] T5 🤗Transformers	15	4787	August 2, 2022
Finetuning transformers for long document summarisation Beginners	0	341	October 25, 2022
Any tutorials for distilling (e.g. GPT2)? Beginners	1	649	August 29, 2021
Finetuning T5 series models with my own data Models	0	140	May 16, 2024

Distillation for LongT5

Related topics