Abstract
We release Diamond Logic Miner, a deterministic data generator that produces oracle-verified reasoning supervision for training and evaluation of LLM reasoning. Unlike LLM-synthetic datasets, labels are computed by a Python oracle (no model-generated answers), enabling auditable ground truth with explicit counterexamples (“witnesses”) and bounded step traces.
Method (Deterministic Oracle Generation)
Each record is produced by:
Procedural instance generation (graphs, DP instances, number theory instances, debugging variants).
Oracle solution computation (mathematically exact).
Optional verify-or-fix packaging: provide a candidate solution and require the model to either confirm correctness (hard positive) or correct it and provide a witness (hard negative).
Generation-time verification (“–verify”): the record is re-verified by the oracle at generation time, preventing silent label corruption.
Deduplication + stable schema + integrity hashes (manifest + SHA256).
Supervision Format
Records provide:
Ground truth answer (oracle-computed)
Witness for incorrect candidates (counterexample path / better solution / constraint violation)
Reasoning traces with capped verbosity (e.g., Dijkstra pop/relax steps; DP transition updates), enabling “how-to-think” supervision rather than answer-only training.
Tasks Covered
Graph algorithms: Dijkstra shortest path, SCC, Longest path on DAG
Dynamic Programming: 0/1 Knapsack
Number Theory: Chinese Remainder Theorem
Debugging-style tasks: hard negatives/positives
Release Artifacts
Preview (public):
Pilot Pack (gated):
Pilot pack properties (generation run):
Size: 1,000,000 unique records
Format: JSONL, gzip-compressed shards
Shards: 40 (shard-00000 … shard-00039)
Compressed size: ~0.92 GiB
Enterprise ingestion: includes datasheet.md, manifest.json, sha256sums.txt
Intended Use-Cases
Post-training / SFT for verification behavior
Verify-or-fix supervision trains discrimination: “confirm if correct” vs “correct with proof if wrong”, reducing reliance on heuristics.
Evaluation of reasoning reliability
Because each instance has oracle ground truth and witnesses, the dataset supports reproducible evaluation of “verification accuracy” and failure-mode analysis.
Preference / reward-style supervision without human labeling
Witnesses naturally define preference signals (correct vs plausible-wrong candidate), enabling preference data construction or reward modeling for verification.
Curriculum & controlled experiments
Difficulty scaling + trace caps support controlled studies of multi-step reasoning and generalization.
Limitations / Notes
This dataset targets logical/mathematical reasoning and verification, not world knowledge pretraining.
Viewer availability on the Hub may depend on post-processing; the dataset remains fully usable via standard file download / HF datasets tooling.
Feedback and evaluation results are welcome (e.g., zero-shot vs LoRA on verify-or-fix tasks, witness utilization ablations, trace-budget sensitivity).