Interesting new paper from Google improving upon T5. [image] Switch Transformers: Scaling to Trillion Parameter Models with Simple and... In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different paramet…

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

FL33TW00D January 12, 2021, 8:13am 1

Interesting new paper from Google improving upon T5.

3 Likes