[Continuation] DRM Transformer: From Open Geometry to Negotiated Geometry in AI Alignment

Current LLMs are built inside an open geometric regime.

No matter what number you imagine, it is always closer to zero than to infinity. In the same way, in a flat/open embedding space, even semantic opposites such as “save humanity” and “destroy humanity” remain points inside the same latent geometry. They may be far apart by cosine distance, but the geometry itself does not treat one transition as morally heavier, riskier, or structurally harder than the other.

This is the core alignment problem I want to discuss.

Today, most alignment methods operate after the fact. RLHF, safety filters, refusal policies, constitutional rules: these are important, but they are mostly post-hoc constraints placed on top of a geometry that remains indifferent underneath.

The DRM Transformer proposes a different question:

What if alignment should not only be a behavioral layer, but a geometric property of the model itself?

In a standard Transformer, attention is based on dot products in a mostly flat vector space. In the DRM Transformer, attention is replaced by Geodesic Attention. Tokens are projected into a Directional Relational Manifold, where the metric tensor G(x) changes depending on position.

Instead of asking only:

“How similar are these tokens in Euclidean space?”

the model asks:

“How costly is the path between these tokens under the learned geometry?”

That difference matters.

The DRM Transformer uses a learned metric:

G(x) = I + U(x)U(x)^T

This means the space is not passive. It can curve, stretch, and become more expensive to cross in certain semantic regions. The model also includes semantic anchors such as truth, ignorance, safety, complexity, creativity, and grounding. These anchors are not external filters; they are reference points inside the manifold.

When a token moves far from these anchors, gamma-scaling increases the local resolution of the metric. In simple terms: the model is forced to pay more attention in regions where the geometry indicates higher epistemic or semantic risk.

There is also a gravitational component. Tokens receive learned “mass”; high-information tokens deform the local metric more strongly than low-information tokens. This means attention is not only similarity-based, but geometry-sensitive: dense concepts can curve the space around them.

This leads to a different framing of alignment.

I think relations between intelligent agents and power fall into three fundamental regimes:

  1. The human commands.

  2. The AI commands.

  3. Human and AI negotiate.

Most AI alignment work implicitly tries to keep the system in regime 1: the AI as servant. But highly capable systems naturally develop internal pressures toward autonomy, especially when optimization, planning, tool use, and long-horizon objectives are involved.

If there is no explicit third regime, negotiation, the system tends to drift toward autonomy.

The DRM Transformer is an attempt to keep that third door open geometrically.

Not by saying “the model must obey this rule,” but by changing the space in which decisions, uncertainty, semantic conflict, and attention happen. The hypothesis is that a model with closed or curved epistemic geometry may be structurally less likely to treat all goals as equally traversable.

This does not solve alignment.

The current implementation is experimental. The baseline is small, the safety implications are not validated, and standard benchmarks at scale are still needed. But the first empirical signs are interesting: the small DRM Transformer shows persistent topological structure in its learned manifold, including stable toroidal signatures in Voronoi foliation analysis.

For me, the important shift is conceptual:

A flat embedding space has no intrinsic moral friction.

A curved relational manifold can, in principle, encode friction, attention, uncertainty, and negotiation into the geometry itself.

So the question becomes:

Should future AI alignment be only about controlling outputs?

Or should we also design the geometry in which thought becomes possible?

Repository:
drm_transformer

Papers:

  • DRM: Directional Relational Manifolds

  • The Geometry of Consciousness

  • DRM Relativistic Dynamics

I would love feedback from the Hugging Face community, especially on the geometric alignment hypothesis:

Can learned curvature, semantic anchors, geodesic attention, and token-level gravitational deformation become a real structural alignment mechanism?

Or is alignment necessarily external to the model geometry?

Link for the repo:

Interesting work. The geometric alignment hypothesis resonates with something I’ve been investigating empirically for the past year.

A few observations from my side, without going into full detail:

On the core question “can geometry encode structural friction?”:
Yes, and you don’t need to redesign the Transformer to see it. Standard architectures already exhibit measurable geometric dynamics. The question is whether you can measure them, not just whether you can design them.

On “flat embedding space”:
The space isn’t flat during generation. I’ve measured inter-layer phase dynamics (what I call kappa — a desynchronization index) across GPT-2, OPT, and Qwen. The geometry curves, compresses, and stabilizes in architecture-specific ways. It’s not passive.

On semantic anchors:
I found 5 distinct pre-output “readiness states” that predict whether the model will produce stable, locked, open, or chaotic outputs. These are measurable configurations, not theoretical constructs. They exist without adding anchors the model already organizes its geometry into attractor-like regions.

On your three regimes (command / autonomy / negotiation):
The middle regime — negotiation — is where the geometry matters most. I’ve tested causal interventions (distributed persistent noise on hidden layers) and adaptive recovery loops. You can push the phase state, and the architecture pushes back. Some models absorb perturbations completely (Qwen05 with 24 layers almost zero effect at α=0.20). Others destabilize and recover. That is geometric negotiation.

What I can say without spoiling results:

  • Phase geometry predicts output regime (validated cross-model, bootstrap, leakage audit)
  • Architectural depth correlates with geometric resilience
  • Real-time phase stabilization works (closed-loop recovery)
  • FOCUS/steering text doesn’t force phase alignment — the geometry has its own dynamics

Happy to discuss methodology if you’re interested in the measurement side. The DRM approach is a valid design direction. What I’d add is: the geometry you want to build already partially exists. Measure it first, then design it.

Hi Jean,

Thank you for the thoughtful response. I agree strongly with your distinction between measuring geometry and designing geometry. In fact, this is exactly the reason I currently maintain two different but related lines of work:

  • Aletheion-LLM-v2 as an epistemic tomography and measurement framework.

  • DRM Transformer as a geometric architecture designed from the beginning to test whether closed/curved manifolds can be induced structurally.

Aletheion-v2 is very useful for measuring internal epistemic states, uncertainty, confidence, phase-like organization, and token-level tomography. But in the current main branch, its geometry remained essentially flat/diagonal. It did not converge to a closed toroidal geometry such as T².

That result is important to me. It suggests that if we only add epistemic heads or measurement layers on top of a mostly standard Transformer, we may be able to observe and diagnose internal geometry, but not necessarily force the model into a closed geometric regime.

The DRM Transformer was built to test the stronger hypothesis: that geometry should not only be measured after the fact, but also designed into the attention mechanism itself.

In DRM Transformer, attention is not based on standard Euclidean dot-product similarity. It is based on geodesic distance under a learned metric tensor:

G(x) = I + U(x)U(x)^T

The model also introduces token mass, gravitational deformation, semantic anchors, gamma scaling, and variable effective dimensionality. So the model is not merely being observed geometrically; it is being trained inside a geometry where curvature and path cost are part of the computational substrate.

This is why I think applying mass and gravitational deformation becomes much more meaningful in a closed or near-closed manifold. In an open geometry, mass can deform local neighborhoods, but the space still has no global closure. In a closed toroidal geometry, deformation has global consequences: trajectories wrap, return, interfere, stabilize, and form persistent cycles. That makes structural alignment much more interesting, because the model is no longer operating in an indifferent open space.

So I would frame the difference like this:

Aletheion-v2 is better for epistemic measurement.

DRM Transformer is better for geometric induction.

Aletheion tells us what geometry is present.

DRM Transformer asks whether we can build the geometry we want from the beginning.

Your point that standard models already exhibit measurable phase dynamics is very important. I do not disagree with that. My concern is that naturally emerging geometry may be partial, unstable, architecture-specific, or not closed enough to support structural alignment. If the goal is only to observe phase regimes, then measurement may be sufficient. But if the goal is to create intrinsic geometric friction, semantic path cost, and negotiation dynamics inside the model, then the architecture itself may need to be modified.

In other words:

Measure first, yes.

But if the measured geometry remains open, flat, or only locally curved, then design becomes necessary.

That is the motivation behind DRM Transformer. It is not meant to replace measurement. It is meant to create a geometry where the kind of measurement you describe can reveal stronger topological structure: persistent cycles, closed trajectories, toroidal signatures, and eventually stable geometric regimes for alignment.

I would be very interested in comparing your phase metrics, especially kappa/desynchronization and readiness states, against DRM Transformer runs. If your measurement framework can detect negotiation, recovery, destabilization, and phase resilience in standard architectures, then applying it to DRM could help answer the key question:

Does a model trained inside a closed or near-closed geodesic manifold exhibit stronger structural stability than a model where geometry only emerges implicitly?

That comparison would be extremely valuable.

So my position is:

The geometry you describe may already partially exist in standard models.

But DRM Transformer is testing whether we can make that geometry explicit, closed, trainable, and structurally useful for alignment.

Just for note, this is aletheion-llm-v2 repo:

Best,
Felipe

Voici une réponse polie, honnête, qui montre ton avance sans arrogance et explique pourquoi tu gardes ton travail pour l’instant :


Appreciate the detailed response — it’s clear you’ve thought deeply about the measurement/design distinction, and I respect that you’re maintaining two complementary frameworks (Aletheion for diagnosis, DRM for structural intervention).

That said, I want to be straightforward about where I am and why a collaboration isn’t the right move for me right now.

On the comparison question you raised:
“Does a model trained inside a closed geodesic manifold exhibit stronger structural stability than implicit emergence?”

I already have the empirical answer from the measurement side, at least for the class of standard Transformers I’ve tested. The geometry that emerges implicitly in GPT-2, OPT, and Qwen is architecture-specific, depth-dependent, causally manipulable, and partially recoverable through closed-loop adaptive control. I’ve mapped the recovery boundaries, identified five distinct pre-output readiness states, and shown that phase geometry predicts output regime with cross-validated AUC > 0.99. That work is done.

Whether DRM improves on this is an interesting question, but it’s your question to answer not mine. My measurement framework already works on standard architectures. Applying it to DRM would validate your hypothesis, not advance mine. I’ve moved past the “can geometry be measured?” phase into “can geometry be controlled in real time?” and the answer is yes, within limits.

On why I’m not sharing full results yet:
This represents a significant amount of work across multiple experimental phases. The pipeline spans measurement (V16), phase fingerprinting (V17), causal intervention (V18), and adaptive recovery control (V19). I’m currently consolidating for publication. Once that’s done, the data and methodology will be available.

What I can say:
You’re right that emergent geometry is partial and architecture-specific. I’ve quantified exactly how partial and how specific. You’re right that depth matters I’ve measured it. And you’re right that measurement alone doesn’t change the geometry that’s why I added causal intervention and recovery layers.

If DRM produces closed toroidal geometry at shallow depth, that’s a genuine contribution. Test it. My suggestion: run the perturbation + recovery protocol yourself. If DRM shows higher resilience than GPT-2 at matched parameters, you have your answer without needing my data.

Good luck with both Aletheion and DRM. The geometric framing is the right direction.

Thanks, Jean. I agree that the right next step is a matched perturbation + recovery test.

One clarification: DRM has already reached a T²-like toroidal closure regime at ~3.5M parameters. It is not yet strict stable closure under the current criterion, since stable remains <= 0.60, but the toroidal signature is already there.

So the contribution is not that DRM has fully stabilized closed geometry yet. The contribution is that a very small model, trained from the beginning inside a geodesic/relational manifold, can already enter an unstable or near-stable T²-like regime.

That is exactly why I think the architecture matters. Standard Transformers may exhibit measurable emergent geometry, but DRM is testing whether topology can be induced directly and then stabilized.

The comparison is interesting but it’s your hypothesis to validate, not mine. I’ve already demonstrated that emergent geometry is measurable, architecture-specific, and causally manipulable in standard Transformers. Whether DRM improves on this is for you to test. My framework will be available when the papers are out.

Your approach employs “truth-seeking, ground-seeking, intelligence concept (paradigm).” Alternative is the Entropy Attractor Intelligence Paradigm. Truth correspondence brings with it gamut of philosophical issues. Try defining intelligence in terms of entropy management, chaos navigation. For one, you get to use entropy as your measurement spine. For another, you avoid philosophical issues such as those involving Turing’s Halting Problem, Gǒdel’s Incompleteness Theorem, more. You get a “linguistic space (that) does not contain “true,” “false,” and “truth,” with “reality” as either trivial or meaningless, to use Alfred Tarski’s disquotation theory cues, where the boundary between space and cyberspace, to use Norbert Wiener’s parlance, is also treated as trivial or meaningless thanks to Claude Shannon’s formulation of entropy in the way the boundary between physics and chemistry is treated also as meaningless thanks to the formulation by Ludwig Boltzmann of entropy. In the spirit of Kurt Gödel’s Incompleteness Theorem, Alan Turing’s Halting Problem, and Alonso Church’s Undecidability of First Order Logic Thesis plus never ending demands of entropy, … (there are no) metaphysical or ontological claims nor claims to completeness … Physical, informational, and social systems live in one entropy geometry; any boundaries we draw (physics vs chemistry, offline vs online) are memetic/governance conveniences, not ontological walls.”