Inference of a BERT INT8 quantized model gives incorrect result on AVX2 only (no… AVX-512 support) CPU. Please note this issue is related to INT8 quantization and AVX2 CPU only . INT8 quantized model works correctly on AVX-512 CPU. UINT8 (unsigned int) quantized model works correctly on both AVX2 CPU and AVX-512 CPU.
It is possible to reproduce this bug on any CPU by running inference under valgrind (valgrind does't support AVX-512). Moreover valgrind reports access to uninitialized variables during inference (likely to be related to incorrect prediction result):
```
Conditional jump or move depends on uninitialised value(s)
at 0x515C8E7: onnxruntime::DynamicQuantizeLinear<unsigned char>::Compute(onnxruntime::OpKernelContext*) const (in /home/jag/bootstrap/local/lib/libonnxruntime.so.1.5.3)
by 0x5346538: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator<int> > const&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<int, std::allocator<int> > const&, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (in /home/jag/bootstrap/local/lib/libonnxruntime.so.1.5.3)
by 0x533233B: onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (in /home/jag/bootstrap/local/lib/libonnxruntime.so.1.5.3)
by 0x5334318: onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (in /home/jag/bootstrap/local/lib/libonnxruntime.so.1.5.3)
by 0x4EAF207: onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<OrtValue, std::allocator<OrtValue> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) (in /home/jag/bootstrap/local/lib/libonnxruntime.so.1.5.3)
by 0x4E7CF63: OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) (in /home/jag/bootstrap/local/lib/libonnxruntime.so.1.5.3)
by 0x13A2A6: Run (onnxruntime_cxx_inline.h:475)
by 0x13A2A6: Run (onnxruntime_cxx_inline.h:466)
```
**System information**
- OS Platform and Distribution: Debian Buster
- ONNX Runtime installed from (source or binary): source
- ONNX Runtime version: 1.5.3
- Python version: 3.7.3
- GCC/Compiler version (if compiling from source): GCC 8.3.0