HFLM Python integration way slower than CLI

John6666 · August 6, 2025, 1:56pm

This may help avoid some of the problems.

eval_model = HFLM(
    pretrained="models/Llama-2-7b-hf",
    dtype="bfloat16",
    device="cuda",
    #device_map="auto",
)
results = evaluator.simple_evaluate(
    model=eval_model,
    tasks=["arc_easy","arc_challenge","hellaswag","lambada_openai","lambada_standard","openbookqa","piqa","winogrande"],
    batch_size=64,
    max_batch_size=64,
)

github.com/EleutherAI/lm-evaluation-harness

Batch size auto is wrong?

opened 11:20PM - 19 Jan 24 UTC

closed 01:48PM - 07 Feb 24 UTC

djstrong

Batch size `auto` is not working correctly (with `generate_until` tasks?). `l…m_eval --model hf --model_args pretrained=HuggingFaceH4/zephyr-7b-alpha,dtype=bfloat16 --tasks polemo2_in --device cuda:0 --batch_size auto` ```Passed argument batch_size = auto. Detecting largest batch size Traceback (most recent call last): File "/net/tscratch/people/plgkwrobel/llm-benchmark/venv/bin/lm_eval", line 8, in <module> sys.exit(cli_evaluate()) File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/__main__.py", line 231, in cli_evaluate results = evaluator.simple_evaluate( File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/utils.py", line 415, in _wrapper return fn(*args, **kwargs) File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/evaluator.py", line 150, in simple_evaluate results = evaluate( File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/utils.py", line 415, in _wrapper return fn(*args, **kwargs) File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/evaluator.py", line 325, in evaluate resps = getattr(lm, reqtype)(cloned_reqs) File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/models/huggingface.py", line 1051, in generate_until batch_size = self._detect_batch_size() File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/models/huggingface.py", line 610, in _detect_batch_size batch_size = forward_batch() File "/net/tscratch/people/plgkwrobel/llm-benchmark/venv/lib/python3.10/site-packages/accelerate/utils/memory.py", line 134, in decorator raise RuntimeError("No executable batch size found, reached zero.") RuntimeError: No executable batch size found, reached zero. ``` The benchmark works with at least `batch_size 2` (on 40GB VRAM card).

Topic		Replies	Views
EleutherAI / lm-evaluation-harness on a custom model Models	0	2067	April 10, 2024
Causal LLM benchmarks Beginners	0	464	June 13, 2023
Huggingface Trainer eval while training 🤗Transformers	1	732	December 31, 2022
Evaluating the model during the run Intermediate	0	473	December 29, 2021
Dataset.transform() hangs indefinitely while finetuning the stable diffusion XL 🤗Transformers	3	8105	January 27, 2024

HFLM Python integration way slower than CLI

Related topics