Hi everyone,
I turned my Master’s degree research on stabilizing very deep Transformers into an open-source PyTorch library called AION-Torch (aion-torch on PyPI). It implements an adaptive residual layer that scales x + α·y based on input/output energy instead of using a fixed residual weight.
On my personal RTX 4060 I ran a 600-layer Pre-LN Transformer test where AION seemed to give more stable gradients and a lower final loss than the standard baseline. But my compute is very limited and I’d love to see how it behaves in more realistic settings.
Repo: https://github.com/Croxus-Labs/aion-torch/ PyPI: https://pypi.org/project/aion-torch/
This is an alpha research project, so honest feedback and criticism are very welcome.
