I’m having a similar issue and was wondering if you or @pierreguillou had any updates on how to convert finetuned sequence models to quantized ONNX models.
When trying the old conversion method I get an import error and it appears to be an ONNX bug. Below is the command and error:
python C:\Users\flabelle002\Anaconda3\envs\hf2\Lib\site-packages\transformers\convert_graph_to_onnx.py --framework pt --model ./local_cola_model --tokenizer bert-base-cased --quantize bert-base-uncased.onnx