I’m trying to evaluate LLMs using a pipeline on the cais/mmlu dataset. Right now, I am merging the options with the questions and passing the merged Question + Options as an input to an LLM wrapped in a pipeline API. However, there are two problems associated with this workflow.
- The LLMs are extremely slow (take > 60 s for a single query for a 13B model on 8x A6000 GPUs)
- Model outputs don’t always match exactly to one of the options in the multiple-choice option set.
How can I overcome these issues? Does anyone have any suggestions?