Improving decoding speed by onnx conversion model

  1. After I use the uer py training model to convert the transformers model, the decoding speed is not very fast. I want to improve the speed by converting the model onnx.

python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/

Some problems were encountered using the converted model.
According to the official documentation, there is no generation method.

  1. I checked the case list of gpt2 conversion onnx. The gpt2 model can be decoded normally and the speed has been improved.

An exception occurred after replacing with the converted Chinese model.

Hope to get help here, thank you!