My Master’s research turned into a PyTorch layer that calms down unstable Transformers

Hi everyone,

I turned my Master’s degree research on stabilizing very deep Transformers into an open-source PyTorch library called AION-Torch (aion-torch on PyPI). It implements an adaptive residual layer that scales x + α·y based on input/output energy instead of using a fixed residual weight.

On my personal RTX 4060 I ran a 600-layer Pre-LN Transformer test where AION seemed to give more stable gradients and a lower final loss than the standard baseline. But my compute is very limited and I’d love to see how it behaves in more realistic settings.

Repo: https://github.com/Croxus-Labs/aion-torch/ PyPI: https://pypi.org/project/aion-torch/

This is an alpha research project, so honest feedback and criticism are very welcome.

4 Likes

Are you talking about an action potential threshold like in humans? No, I’ve just checked, but it serves the same function.
The action potential threshold provides several crucial advantages for the function and efficiency of the nervous system by ensuring reliable, high-fidelity, and regulated signal transmission.

1 Like

That’s a nice analogy, it isn’t a hard threshold, it’s a continuous scaling based on input/output energy, but the goal is similar: keep signals in a stable, high-fidelity regime instead of letting them blow up or die out.

2 Likes

i’m going to give this a try. these are what you want us to track?

2 Likes

Yes, I will be happy if you can give it a try)

2 Likes