Difference in the vector generated by the int8 quantized model vs base onnx model

Recently i tried to compare the BAAI/bge-m3 Onnx model vs its int8 AVX2 quantized version. I see that there is a huge difference in the vectors generated by the base Onnx model vs the vectors generated by the int8 avx2 quantized. The difference was big even though i tried to use different quantization instructions like AVX512, AVX512_VNNI.

Following are the commands used for onnx and int 8 avx2 conversion

Base onnx model
optimum-cli export onnx --model BAAI/bge-m3 bge-m3-base-onnx

Int8 AVX2 quantized model
optimum-cli onnxruntime quantize --onnx model bge-m3-base-onnx --avx2 -o bge-m3-int8-avx2

I would like to know why is there a huge diffference in the vectors generated by both the models or is it something i am wrongly converting ?

1 Like

Quantization, or even just casting, can cause slight changes in inference results, which is normal. However, if the results diverge significantly, there may be an issue.

For example, the framework may be configured to skip certain operations, or there may be an unknown bug.

@John6666 thanks for the reply. Any way to debug what has gone wrong, asking this since, i dont get any errors or warning while quantising the model

1 Like

There are ways to output intermediate results for debugging, but I think there are also ways to isolate the problem or search for existing issues that may be related. Also the ONNX official FAQ seems useful. Have you tried reduce-range, for example?

Also, regarding ONNX, you can get reliable information by contacting the ONNX Community members on Hugging Face.:grinning_face:

Thanks for the response @John6666

1 Like