Runtime Identity Drift in LLMs — Can We Stabilize Without Memory?

Hi everyone,

I’ve been working on stabilizing role identity in LLM outputs over long interactions — without relying on memory, logs, or retraining.

Problem: Most multi-agent chains and LLM workflows suffer from role drift and behavioral collapse after a few hundred turns. Context windowing and prompt engineering only delay the inevitable.

Experiment: I built a runtime coherence layer (called SAGE) that maintains behavioral identity using real-time feedback signals (Cr, ∆Cr, RTR) — without storing past interactions.

Results:

  • 75 unique roles tested
  • 3000+ consecutive turns without identity collapse
  • FSM trace: Stable → Drift → Correction → Return → Stabilized

fsm_trace

Open Questions:

  • Can runtime feedback (without storage) be a viable path to LLM self-coherence?
  • Should “self-return” behavior be a mandatory runtime layer for long-lived agents?
  • How would you design a lightweight coherence engine on top of black-box LLMs?

Discussion:
Curious to hear how others here approach role drift, autonomous agent stability, or runtime self-alignment.
Any frameworks or prototypes you have seen or tried?

Full demo report and FSM traces (GitHub):
GitHub: Edgeev/SAGE-AI-Layer-0-AGI-runtime-LLM

P.S: I am currently seeking academic validation of the runtime model through collaboration with university research labs.

If any research teams, lab members, or independent researchers are interested:

  • I can provide a secure demo version of the system for evaluation purposes.
  • In exchange, I would request a brief written technical assessment (positive or critical) from the lab or research group.
2 Likes

Sounds interesting, but honestly — maintaining identity without any memory sounds almost too good to be true. How does it actually handle deep drift when the model gets subtly nudged over 100+ turns? Would be curious to see a stress test report or real examples if you have any.

1 Like

We specifically tested SAGE under micro-adversarial drift over long sessions (200+ turns with gradual role shifting, baited topic deviations, and soft contradiction injections). The correction mechanism triggers based on runtime behavioral feedback rather than hard-coded prompts, which helps catch subtle identity erosion before it becomes unrecoverable.

If you’re curious, some detailed stress test traces and evaluation reports are available in the GitHub

1 Like