Technical blog post -- streaming algorithms and numerical stability in ML systems

Hey everyone — I published a technical blog post today on streaming algorithms and numerical stability in ML systems, using FlashAttention as the main example:

What it covers:
FlashAttention’s tiled computation avoids materializing the full attention matrix — but doing so requires maintaining numerically stable running statistics (running max, normalization constant, output accumulator). This turns out to be the same design constraint behind stable softmax, log-sum-exp, and Welford’s variance algorithm.

The post traces that common pattern and includes two small experiments:

  • Variance: four mathematically equivalent formulas that produce wildly different results under float32 (including one that returns -65,542 when the correct answer is ~1)
  • Softmax: naive vs. subtract-max, showing overflow propagation to NaN

Why I wrote it:
A lot of the “implementation details” in ML infrastructure aren’t really details — they’re load-bearing. I wanted to write something that made that concrete rather than just asserting it.

Would love feedback, especially:

  • Are there other examples of this pattern I should have included?
  • Anything in the numerical stability section that could be sharper?

:link: to post:

:nerd_face: , Jen

1 Like