Operational Self-Improvement in a Frozen Qwen3 14B on Consumer Hardware — Two Autonomous Rules, 66-Point Ablation Delta, Reasoning Constraints Discovery

Paper: https://zenodo.org/records/19530926

A frozen Qwen3 14B (Q4_K_M, Ollama) running on an Apple Mac Mini M4 (24GB) autonomously produced two corrective reasoning constraints through a quality-gated pipeline (MERRCURR), verified against a 110-probe behavioural battery. No fine-tuning, no cloud, no weight modification.

Results:

  • Two autonomously promoted rules across independent error classes — one with operational error recurrence 3→0 (Poisson p=0.0498), one with clean validation delta +2 and no regressions

  • Bare model ablation: 28% → 94% with full architecture (Fisher p<0.001, 66-point delta on 20-probe subset)

  • Within-probe dissociation: reasoning accuracy 88–100%, labelling compliance 0% on the same probes (Fisher p<0.001). The model reasons correctly but does not label — reasoning constraints work, formatting instructions do not in any tested configuration

  • Multi-principal transfer: 84%, 83%, 80% across three industries, all CIs overlapping

  • 19 days continuous sovereign operation, zero cloud dependency verified in code

Fourth paper in the ATLAS research programme. Prior papers: positional restructuring (DOI: 10.5281/zenodo.19427878), calibrated self-assessment (DOI: 10.5281/zenodo.19435861), MERRCURR pipeline (DOI: 10.5281/zenodo.19448879).

Full methodology, statistical tests, probe rubrics, and limitations in the paper.

1 Like