Optimum arm64 quantized models on Apple Silicon (M1)

Hi everyone,

I’m running some quantization experiments on my MBP 13 M1 with optimum and onnx. I see that arm64 quantized model use 2x less time for an inference pass than basic ORT model without quantization. At the same time, if I ran avx2-quantized model, it also uses less time for an inference pass, but it is not as fast as arm64 model(see screenshot). Does optimum actually uses arm64 instructions during inference of arm64-quantized models? Or the speedup is just a result of some other default optimizations?