First instalment the Muon Optimizer tutorial series

bird-of-paradise · August 19, 2025, 2:06am

I just published the first part of a tutorial series on the Muon Optimizer.

Muon (Momentum Orthogonalized by Newton-Schulz) is quickly becoming the go-to optimizer for large-scale training. It’s already powering trillion-parameter frontier models like Kimi-2 (MuonClip) and was critical for the ATLAS paper, where first-order optimizers failed.

In this series, I’m breaking Muon down step by step: intuition, pseudocode, PyTorch implementation, and practical guidance on when/where to use it.

Medium post

Also — I’d really like to contribute this as a guest article to the Hugging Face blog. I know the blog is managed by a group, but it looks like external contributors can’t directly join. If anyone here has advice or connections on how to submit contributions, I’d love to hear it

Muon deserves more attention in the open-source community, and I’d be excited to help bridge that gap.

John6666 · August 19, 2025, 7:14am

It seems that the standard procedure is to press the join button and wait for approval, or to post on GitHub. If you are in a hurry, it may be quicker to contact the staff via email or Discord. website@huggingface.co

system · August 20, 2025, 12:04am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Tutorial] Understanding and Implementing the Muon Optimizer Show and Tell	1	274	September 17, 2025
@sgugger Progress Update Aug 4 -> Aug 19 🤗Transformers	5	395	August 20, 2020
Pre-training with Lamb optimizer Research	7	4346	December 28, 2020
Huggingface Optimizer 🤗Optimum	2	480	May 25, 2023
Transformers v3.0.0 is out! 🤗Transformers	0	1941	July 7, 2020

First instalment the Muon Optimizer tutorial series

Related topics