HFLM Python integration way slower than CLI

Hello, I have been currently trying to evaluate some models. When doing it with the following script, it takes about 5 minutes (which is fine):

#!/bin/bash
#SBATCH --account bsc03
#SBATCH --qos=acc_debug
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=80
#SBATCH --time 00:15:00

export NUMEXPR_MAX_THREADS=128
export CUDA_VISIBLE_DEVICES=0

model="models/Llama-2-7b-hf"
task="arc_easy,arc_challenge,hellaswag,lambada_openai,lambada_standard,openbookqa,piqa,winogrande"
lm_eval --model hf \
        --model_args pretrained=$model,dtype=bfloat16,device_map=auto \
        --task $task \
        --batch_size 64

However, when using HFLM class, it takes more than 20, but I am presumably doing the same thing:

import transformers
from lm_eval import evaluator
from lm_eval.models.huggingface import HFLM

eval_model = HFLM(
    pretrained="models/Llama-2-7b-hf",
    dtype="bfloat16",
    device_map="auto",
)
results = evaluator.simple_evaluate(
    model=eval_model,
    tasks=["arc_easy","arc_challenge","hellaswag","lambada_openai","lambada_standard","openbookqa","piqa","winogrande"],
    batch_size=64,
)
print(results['results'])

I want to use HFLM because I intend to pass an already-loaded model and it’s way more convinient than saving the model and calling the CLI script.

Any advice? What am I doing wrong? Is this a bug?

Thanks

1 Like

This may help avoid some of the problems.

eval_model = HFLM(
    pretrained="models/Llama-2-7b-hf",
    dtype="bfloat16",
    device="cuda",
    #device_map="auto",
)
results = evaluator.simple_evaluate(
    model=eval_model,
    tasks=["arc_easy","arc_challenge","hellaswag","lambada_openai","lambada_standard","openbookqa","piqa","winogrande"],
    batch_size=64,
    max_batch_size=64,
)

Thank you for your message! However, your code still takes about 20 minutes to run. The CLI script takes about 5. I don’t know what is going on.

Thanks!

1 Like

Hmm… It seems like CLI one is running with accelerate, so how about trying to imitate that?

accelerate launch --num_processes 1 python eval.py

Hi, I just tried it . Still same thing, it still takes 20 minutes instead of 5.

1 Like