Inference with Finetuned BERT Model converted to ONNX does not output probabilities

Hey @heinz, there’s a notebook here that you can use to get started: transformers/04-onnx-export.ipynb at master · huggingface/transformers · GitHub

The main thing you need to do is create an ORT InferenceSession with e.g. the following function:

def create_model_for_provider(model_path: str, provider: str) -> InferenceSession: 
  
  assert provider in get_all_providers(), f"provider {provider} not found, {get_all_providers()}"

  # Few properties that might have an impact on performances (provided by MS)
  options = SessionOptions()
  options.intra_op_num_threads = 1
  options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

  # Load the model as a graph and prepare the CPU backend 
  session = InferenceSession(model_path, options, providers=[provider])
  session.disable_fallback()
    
  return session

Once you create a session, you’ll still need to tokenize and encode the inputs and you can find some additional examples in the ORT repo as well, e.g. onnxruntime/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb at master · microsoft/onnxruntime · GitHub

1 Like