Calculating Perplexity for Quantized Llama 3 8B & Mistral 7B Models: Evaluate Library vs. Custom Code?

Niazi · March 15, 2025, 8:51pm

Hi everyone,

I have quantized versions of the Llama 3 8B and Mistral 7B models, and I’m looking to calculate their perplexity (PPl). I came across the Hugging Face Evaluate library, but I’m a bit confused about how to best use it for this purpose.

Has anyone used the Evaluate library to calculate perplexity on similar quantized models? Does that library support these versions of models? Alternatively, do you recommend writing custom evaluation code for more accurate or tailored results?

Any insights, suggestions, or code examples would be greatly appreciated!

Thanks in advance for your help.

Niazi · March 15, 2025, 8:52pm

@mariosasko

Niazi · March 15, 2025, 9:31pm

@Wauplin @lhoestq @mariosasko

@nielsr

Please give your expert opinion.

John6666 · March 16, 2025, 10:17am

It seems that an error occurs when you run it as is for the quantization model. I think it will work if you dequantize it once or fix the library…

Topic		Replies	Views
How can I use evaluate's perplexity metric on a model that's already loaded? Intermediate	0	1684	July 28, 2023
Perplexity Calculation in run_clm.py 🤗Transformers	0	278	May 23, 2024
Useful compute_metrics functions for perplexity 🤗Transformers	0	641	September 29, 2022
Log Perplexity using Trainer 🤗Transformers	2	1983	October 9, 2021
Huge discrepancy in perplexity of LLM for Trainer v/s scratch implementation? Beginners	1	137	October 24, 2024

Calculating Perplexity for Quantized Llama 3 8B & Mistral 7B Models: Evaluate Library vs. Custom Code?

Related topics