Evaluation on Quantized model yields identical results across bit-precisions

daltonh · June 20, 2024, 5:52pm

I have been trying to benchmark some models across different bit precisions. I use optimum quanto and BitsAndBytes to achieve this.
When I use trainer.evaluate() however, the output is nearly identical between the quantized models and Float models (even for 2-bit integer precision), which seems unlikely to me. Fine-tuning the model yields different results. I have a hunch that the trainer.evaluate() function does not use the quantized layers or something similar to this, as this is also the case for different models and different PTQ methods.
To quantize, I use optimum.quanto.quantize(model) and optimum.quanto.freeze(model).

Does anyone have an idea how this happens?

As a sidenote, to use trainer.evaluate() on quantized models, I created a subclass that inherits from the Trainer class to circumvent errors related to training.

daltonh · June 20, 2024, 6:45pm

Figure with results for reference:

Topic		Replies	Views
Evaluating with quantized model using Trainer Beginners	1	272	August 10, 2024
Evaluation using bits per character 🤗Transformers	0	348	October 3, 2022
Using Trainer class + 4/8 bit quantised model for prediction 🤗Transformers	0	233	April 24, 2024
Evaluate model at saved checkpoint 🤗Transformers	0	1295	June 22, 2021
Trainer predict or evaluate returns zero for metrics 🤗Transformers	0	55	July 11, 2024

Evaluation on Quantized model yields identical results across bit-precisions

Related topics