Hey @heinz, there’s a notebook here that you can use to get started: transformers/04-onnx-export.ipynb at master · huggingface/transformers · GitHub
The main thing you need to do is create an ORT InferenceSession
with e.g. the following function:
def create_model_for_provider(model_path: str, provider: str) -> InferenceSession:
assert provider in get_all_providers(), f"provider {provider} not found, {get_all_providers()}"
# Few properties that might have an impact on performances (provided by MS)
options = SessionOptions()
options.intra_op_num_threads = 1
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
# Load the model as a graph and prepare the CPU backend
session = InferenceSession(model_path, options, providers=[provider])
session.disable_fallback()
return session
Once you create a session, you’ll still need to tokenize and encode the inputs and you can find some additional examples in the ORT repo as well, e.g. onnxruntime/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb at master · microsoft/onnxruntime · GitHub