Different results from HF and ONNX

Hi there!

Converting a model to ONNX using python -m transformers.onnx --model=dslim/bert-large-NER onnx and loading the model up in Java using the onnxruntime library gives off different results compared to when run using HF.

I’ve found this issue [Bug] Attention and QAttention don't work properly in some cases · Issue #14363 · microsoft/onnxruntime · GitHub and just wanted to check whether there’s a solution to this that I could use.