How to evaluate CLMs on MMLU?

surya-narayanan · September 12, 2023, 1:05am

Hi folks,

I’m trying to evaluate LLMs using a pipeline on the cais/mmlu dataset. Right now, I am merging the options with the questions and passing the merged Question + Options as an input to an LLM wrapped in a pipeline API. However, there are two problems associated with this workflow.

The LLMs are extremely slow (take > 60 s for a single query for a 13B model on 8x A6000 GPUs)
Model outputs don’t always match exactly to one of the options in the multiple-choice option set.

How can I overcome these issues? Does anyone have any suggestions?

Topic		Replies	Views
Mixtral 8x7B or any LLM evaluation Models	0	182	March 15, 2024
Using LLM for Data Analytics Beginners	1	1301	June 7, 2025
Evaluating LLM for specific programming languages Beginners	0	283	May 23, 2024
Evaluate fine-tuned LLM for question answering Beginners	1	52	May 2, 2025
ModelClash: Dynamic LLM Evaluation Through AI Duels Show and Tell	0	47	July 22, 2024

How to evaluate CLMs on MMLU?

Related topics