Can somebody point to the exact github commit for lm-eval and command line options they use for MATH? Here is my attempt with latest commit of lm-evaluation-harness, and I cannot get from these numbers to the number posted on the leaderboard.
$ lm_eval --model hf --model_args pretrained=nvidia/AceMath-1.5B-Instruct --tasks leaderboard_math_hard --device cuda:0 --batch_size auto --trust_remote_code --log_samples --output_path foo-results
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------------------------------------------|-------|------|-----:|-----------|---|-----:|---|-----:|
|leaderboard_math_hard | N/A| | | | | | | |
| - leaderboard_math_algebra_hard | 2|none | 4|exact_match|↑ |0.5407|± |0.0285|307
| - leaderboard_math_counting_and_prob_hard | 2|none | 4|exact_match|↑ |0.3008|± |0.0415|123
| - leaderboard_math_geometry_hard | 2|none | 4|exact_match|↑ |0.1212|± |0.0285|132
| - leaderboard_math_intermediate_algebra_hard| 2|none | 4|exact_match|↑ |0.1000|± |0.0180|280
| - leaderboard_math_num_theory_hard | 2|none | 4|exact_match|↑ |0.2078|± |0.0328|154
| - leaderboard_math_prealgebra_hard | 2|none | 4|exact_match|↑ |0.5026|± |0.0361|193
| - leaderboard_math_precalculus_hard | 2|none | 4|exact_match|↑ |0.1185|± |0.0279|135