Hello, I have been currently trying to evaluate some models. When doing it with the following script, it takes about 5 minutes (which is fine):
#!/bin/bash
#SBATCH --account bsc03
#SBATCH --qos=acc_debug
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=80
#SBATCH --time 00:15:00
export NUMEXPR_MAX_THREADS=128
export CUDA_VISIBLE_DEVICES=0
model="models/Llama-2-7b-hf"
task="arc_easy,arc_challenge,hellaswag,lambada_openai,lambada_standard,openbookqa,piqa,winogrande"
lm_eval --model hf \
--model_args pretrained=$model,dtype=bfloat16,device_map=auto \
--task $task \
--batch_size 64
However, when using HFLM class, it takes more than 20, but I am presumably doing the same thing:
import transformers
from lm_eval import evaluator
from lm_eval.models.huggingface import HFLM
eval_model = HFLM(
pretrained="models/Llama-2-7b-hf",
dtype="bfloat16",
device_map="auto",
)
results = evaluator.simple_evaluate(
model=eval_model,
tasks=["arc_easy","arc_challenge","hellaswag","lambada_openai","lambada_standard","openbookqa","piqa","winogrande"],
batch_size=64,
)
print(results['results'])
I want to use HFLM because I intend to pass an already-loaded model and it’s way more convinient than saving the model and calling the CLI script.
Any advice? What am I doing wrong? Is this a bug?
Thanks