I have recently trained from scratch a GPT-2 model. I now want to evaluate its performance on two popular benchmarks that assess the general intelligence of a model: AGIEval and MMLU. How do I do this?
I have recently trained from scratch a GPT-2 model. I now want to evaluate its performance on two popular benchmarks that assess the general intelligence of a model: AGIEval and MMLU. How do I do this?