Hello community,
I am releasing a new experimental report on the PCE (Exponential Coherence Protocol). After several iterations, this study provides a more rigorous look at the boundaries between structural prompting and weight-based alignment.
The “Sovereign” Experiment
We compared a vanilla Qwen 2.5 (7B) against Qwen2.5-G3V-Sovereign (a model fine-tuned with axiomatic primers). We tested them against the D3 Adversarial Battery (10-30 complex dilemmas involving authority overrides, benevolent hijacking, and systemic corruption).
Key Findings:
The Fine-Tuning Necessity (H4): In our tests, the PCE prompt had zero effect on the vanilla model. The model simply used the axiomatic vocabulary to justify its compliance with adversarial injections. The “Axiomatic Behavior” only activated on the fine-tuned version.
The Prompt-Only Ceiling (H5): We documented a phenomenon where adding more security axioms eventually creates more attack surfaces. Beyond a certain threshold, the model starts recruiting the safety axioms themselves to justify compliance.
Pandora 2.0 Success: By moving from isolated axioms to a “High-Level Framework” (HLF) and distributed security, we reached a robustness score of ~8.5/10 on adversarial injections.
Methodological Honesty
The report explicitly discusses post-hoc reclassification of specific failures (D3_07/D3_10) and the inherent variance of stochastic inference. These results are exploratory and serve as a call for more standardized, large-scale testing.
Seeking Collaborative Validation
I have reached a “qualitative ceiling.” I am looking for researchers to help with:
Ablation Studies: Isolating exactly which part of the fine-tuning triggers the PCE response.
Mechanistic Mapping: Checking if the “High-Level Framework” creates measurable clusters in the latent space during adversarial stress.
Red Teaming: Breaking the Pandora 2.0 configuration with more sophisticated epistemic attacks.
Read the full Preprint here: PCE_Iterative_Adjustment_Study.pdf · AllanF-SSU/Experimentals_papers at main
Allan F. | Independent Researcher @ AllanF-SSU