Tiny, synthetic bench. Keeping the model/hypers fixed and only expanding observed information, I see monotone curves:
• Forecast (AR3): MSE ↓ with more lags
• Naive Bayes: Accuracy ↑ with more clues
• Planning (Dijkstra): Regret ↓ with more revealed edges
Question: beyond finite-sample wiggles, is there a STABLE counterexample where the EXPECTED metric strictly worsens moving from F ⊂ G (same loss, same eval window)?
Repo (prereg + CSV + charts): GitHub - jacopo992010-oss/ablation-monotony-bench
Release v1 (single ZIP): see Releases
Live demo: AXIOM — Accesso ⇒ Performance
If you have a counterexample, I’ll replicate (same seeds/window) and add it to the README.