Paper: https://zenodo.org/records/19530926
A frozen Qwen3 14B (Q4_K_M, Ollama) running on an Apple Mac Mini M4 (24GB) autonomously produced two corrective reasoning constraints through a quality-gated pipeline (MERRCURR), verified against a 110-probe behavioural battery. No fine-tuning, no cloud, no weight modification.
Results:
-
Two autonomously promoted rules across independent error classes — one with operational error recurrence 3→0 (Poisson p=0.0498), one with clean validation delta +2 and no regressions
-
Bare model ablation: 28% → 94% with full architecture (Fisher p<0.001, 66-point delta on 20-probe subset)
-
Within-probe dissociation: reasoning accuracy 88–100%, labelling compliance 0% on the same probes (Fisher p<0.001). The model reasons correctly but does not label — reasoning constraints work, formatting instructions do not in any tested configuration
-
Multi-principal transfer: 84%, 83%, 80% across three industries, all CIs overlapping
-
19 days continuous sovereign operation, zero cloud dependency verified in code
Fourth paper in the ATLAS research programme. Prior papers: positional restructuring (DOI: 10.5281/zenodo.19427878), calibrated self-assessment (DOI: 10.5281/zenodo.19435861), MERRCURR pipeline (DOI: 10.5281/zenodo.19448879).
Full methodology, statistical tests, probe rubrics, and limitations in the paper.