Cannot replicate leaderboard MATH scores

Can somebody point to the exact github commit for lm-eval and command line options they use for MATH? Here is my attempt with latest commit of lm-evaluation-harness, and I cannot get from these numbers to the number posted on the leaderboard.

$ lm_eval --model hf --model_args pretrained=nvidia/AceMath-1.5B-Instruct --tasks leaderboard_math_hard --device cuda:0 --batch_size auto --trust_remote_code --log_samples --output_path foo-results
|                    Tasks                    |Version|Filter|n-shot|  Metric   |   |Value |   |Stderr|
|---------------------------------------------|-------|------|-----:|-----------|---|-----:|---|-----:|
|leaderboard_math_hard                        |    N/A|      |      |           |   |      |   |      |
| - leaderboard_math_algebra_hard             |      2|none  |     4|exact_match|↑  |0.5407|±  |0.0285|307
| - leaderboard_math_counting_and_prob_hard   |      2|none  |     4|exact_match|↑  |0.3008|±  |0.0415|123
| - leaderboard_math_geometry_hard            |      2|none  |     4|exact_match|↑  |0.1212|±  |0.0285|132
| - leaderboard_math_intermediate_algebra_hard|      2|none  |     4|exact_match|↑  |0.1000|±  |0.0180|280
| - leaderboard_math_num_theory_hard          |      2|none  |     4|exact_match|↑  |0.2078|±  |0.0328|154
| - leaderboard_math_prealgebra_hard          |      2|none  |     4|exact_match|↑  |0.5026|±  |0.0361|193
| - leaderboard_math_precalculus_hard         |      2|none  |     4|exact_match|↑  |0.1185|±  |0.0279|135

1 Like

Oh… Perhaps fixed glitch?