I have quantized versions of the Llama 3 8B and Mistral 7B models, and I’m looking to calculate their perplexity (PPl). I came across the Hugging Face Evaluate library, but I’m a bit confused about how to best use it for this purpose.
Has anyone used the Evaluate library to calculate perplexity on similar quantized models? Does that library support these versions of models? Alternatively, do you recommend writing custom evaluation code for more accurate or tailored results?
Any insights, suggestions, or code examples would be greatly appreciated!