Hugging Face Forums
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Research
FL33TW00D
January 12, 2021, 8:13am
1
Interesting new paper from Google improving upon T5.
3 Likes
show post in topic
Related topics
Topic
Replies
Views
Activity
BigBirdPegasus with attention_type="original_full" vs T5
🤗Transformers
0
254
March 11, 2022
Hugging Face Reads - 01/2021 - Sparsity and Pruning
Research
14
7499
June 3, 2025
DeepSeek Architecture Series: MoE Implementation
Show and Tell
0
141
February 28, 2025
How to run hf MoE series model in an expert parallel manner?
🤗Transformers
0
372
April 7, 2024
Num_experts_per_tok for MoE models
Models
0
164
May 6, 2024