Grokking Beyond Addition

Hi everyone,

I’m excited to share my research paper:
“Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers”

:page_facing_up: Paper: https://zenodo.org/records/19256207

This work explores grokking across multiple algebraic structures and shows a clear result:
At small model scale (d_model = 64), transformers reliably grok abelian tasks but fail to generalize on non-abelian groups, even with 100% training accuracy.

It also highlights:

  • Early circuit formation before generalization

  • Evidence for discrete-log structure in multiplication

  • Strong embedding similarity across different tasks (CKA)


I’m opening this project for collaboration and contributions:

  • Scaling experiments (d_model = 128 / 256)

  • Extending to more algebraic structures

  • Interpretability improvements

  • Reproduction and benchmarking

If you’re interested in mechanistic interpretability, grokking, or theory-driven ML, feel free to contribute, open issues, or reach out. Let’s build this together.

1 Like