Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Interesting new paper from Google improving upon T5.

3 Likes