Calculating Perplexity for Quantized Llama 3 8B & Mistral 7B Models: Evaluate Library vs. Custom Code?

Hi everyone,

I have quantized versions of the Llama 3 8B and Mistral 7B models, and I’m looking to calculate their perplexity (PPl). I came across the Hugging Face Evaluate library, but I’m a bit confused about how to best use it for this purpose.

Has anyone used the Evaluate library to calculate perplexity on similar quantized models? Does that library support these versions of models? Alternatively, do you recommend writing custom evaluation code for more accurate or tailored results?

Any insights, suggestions, or code examples would be greatly appreciated!

Thanks in advance for your help.

@mariosasko

@Wauplin @lhoestq @mariosasko

@nielsr

Please give your expert opinion.

It seems that an error occurs when you run it as is for the quantization model. I think it will work if you dequantize it once or fix the library…