BORT: Optimal Subarchitecture Extraction for BERT

FL33TW00D · December 4, 2020, 6:34pm

Hi guys,
Wondering if anyone has read the new paper from the Alexa team regarding BERT size reduction.

If anyone has any thoughts on it or would like to discuss please comment here.

Thanks

Jung · December 5, 2020, 4:16am

Super interesting, thanks for sharing!! Perhaps @VictorSanh can give us the best comments

Wondering if the same technique can be efficiently used for the giant models like T5-11B and GPT-3

Topic		Replies	Views
Transformers for small datasets? Beginners	3	72	October 9, 2024
TinyReformer/TinyLongformer details Models	3	432	November 6, 2020
CPU based Bert question answering model Models	0	395	November 18, 2021
Fine-tuning BERT with multiple classification heads 🤗Transformers	10	5499	January 19, 2024
Seeking Advice on Optimizing Hardware Resources for Model Training Beginners	3	148	August 4, 2024