First instalment the Muon Optimizer tutorial series

:glowing_star: I just published the first part of a tutorial series on the Muon Optimizer.

Muon (Momentum Orthogonalized by Newton-Schulz) is quickly becoming the go-to optimizer for large-scale training. It’s already powering trillion-parameter frontier models like Kimi-2 (MuonClip) and was critical for the ATLAS paper, where first-order optimizers failed.

In this series, I’m breaking Muon down step by step: intuition, pseudocode, PyTorch implementation, and practical guidance on when/where to use it.

:link: Medium post

Also — I’d really like to contribute this as a guest article to the Hugging Face blog. I know the blog is managed by a group, but it looks like external contributors can’t directly join. If anyone here has advice or connections on how to submit contributions, I’d love to hear it :folded_hands:

Muon deserves more attention in the open-source community, and I’d be excited to help bridge that gap.

1 Like

It seems that the standard procedure is to press the join button and wait for approval, or to post on GitHub. If you are in a hurry, it may be quicker to contact the staff via email or Discord. website@huggingface.co

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.