Hi everyone,
I’m an independent researcher looking for technical feedback and an arXiv endorsement for cs.CL.
The paper
“Uniformity Asymmetry: An Exploratory Metric for Detecting Representational Preferences in LLM Embeddings”
Abstract (short):
Most bias and safety evaluations focus on generation. This work introduces Uniformity Asymmetry (UA), a metric that measures how uniformly LLMs cluster semantically equivalent but differently framed statements in embedding space.
NEW: Extended Results (v1.1)
After releasing the initial paper, I ran output-level validation experiments to test whether embedding asymmetries correlate with downstream behavior.
This validation surfaced several unexpected patterns:
-
Gemma correlation artifact
- Initial r ≈ 0.95 driven almost entirely by one category (GT-Numeric)
- Removing numeric-only statements flips the correlation (r ≈ −0.61)
-
Embedding–output decoupling (Llama)
- Near-zero embedding asymmetry
- Strong, systematic framing-dependent differences in output log-probabilities
- Suggests embedding neutrality does not necessarily imply behavioral neutrality
-
Multilingual compression effects (Apertus)
- Tighter clustering for abstract concepts
- Precise alignment for numeric facts
- Likely related to multilingual representational tradeoffs
-
Confidence × asymmetry clustering
- Categories separate along axes resembling confidence and representational spread
- Raises questions about “lie vs bullshit” style distinctions at the representation level
Interpretation (open to critique)
One possible interpretation is that post-training alignment (e.g. RLHF) may alter output behavior without producing a corresponding signal in embedding geometry.
I explicitly treat this as a hypothesis, not a conclusion, and I’d appreciate feedback on alternative explanations or better experimental designs.
Resources
- GitHub (code, data, extended results): GitHub - buk81/uniformity-asymmetry: Calibrated Detection of Normative Preferences in LLM Embeddings
- Zenodo DOIs: 10.5281/zenodo.18110161 (Paper) | 10.5281/zenodo.18117757 (Extended)
The ask
- Technical feedback on the metric and validation setup
- Alternative baselines or null models
- If you have cs.CL endorsement rights, my code is: TFTB6N
Background
I’m a Data Science student (distance learning) and work full-time in gastronomy in Germany. This is a fully independent project using my own compute resources.
Thanks for your time and feedback!
Contact: waiter.no1@proton.me