Hi guys,
Wondering if anyone has read the new paper from the Alexa team regarding BERT size reduction.
If anyone has any thoughts on it or would like to discuss please comment here.
Thanks
Hi guys,
Wondering if anyone has read the new paper from the Alexa team regarding BERT size reduction.
If anyone has any thoughts on it or would like to discuss please comment here.
Thanks
Super interesting, thanks for sharing!! Perhaps @VictorSanh can give us the best comments
Wondering if the same technique can be efficiently used for the giant models like T5-11B and GPT-3