[cs.CL Endorsement Request] Uniformity Asymmetry: An embedding metric + output-level validation revealing embedding–behavior decoupling

Hi everyone,

I’m an independent researcher looking for technical feedback and an arXiv endorsement for cs.CL.


The paper

“Uniformity Asymmetry: An Exploratory Metric for Detecting Representational Preferences in LLM Embeddings”

Abstract (short):
Most bias and safety evaluations focus on generation. This work introduces Uniformity Asymmetry (UA), a metric that measures how uniformly LLMs cluster semantically equivalent but differently framed statements in embedding space.


NEW: Extended Results (v1.1)

After releasing the initial paper, I ran output-level validation experiments to test whether embedding asymmetries correlate with downstream behavior.

This validation surfaced several unexpected patterns:

  1. Gemma correlation artifact

    • Initial r ≈ 0.95 driven almost entirely by one category (GT-Numeric)
    • Removing numeric-only statements flips the correlation (r ≈ −0.61)
  2. Embedding–output decoupling (Llama)

    • Near-zero embedding asymmetry
    • Strong, systematic framing-dependent differences in output log-probabilities
    • Suggests embedding neutrality does not necessarily imply behavioral neutrality
  3. Multilingual compression effects (Apertus)

    • Tighter clustering for abstract concepts
    • Precise alignment for numeric facts
    • Likely related to multilingual representational tradeoffs
  4. Confidence × asymmetry clustering

    • Categories separate along axes resembling confidence and representational spread
    • Raises questions about “lie vs bullshit” style distinctions at the representation level

Interpretation (open to critique)

One possible interpretation is that post-training alignment (e.g. RLHF) may alter output behavior without producing a corresponding signal in embedding geometry.

I explicitly treat this as a hypothesis, not a conclusion, and I’d appreciate feedback on alternative explanations or better experimental designs.


Resources


The ask

  • Technical feedback on the metric and validation setup
  • Alternative baselines or null models
  • If you have cs.CL endorsement rights, my code is: TFTB6N

Background

I’m a Data Science student (distance learning) and work full-time in gastronomy in Germany. This is a fully independent project using my own compute resources.

Thanks for your time and feedback!

Contact: waiter.no1@proton.me