Inference with Finetuned BERT Model converted to ONNX does not output probabilities

Environment info

  • transformers version: 3.5.1
  • Platform: Linux-4.14.203-116.332.amzn1.x86_64-x86_64-with-glibc2.10
  • Python version: 3.7.6
  • PyTorch version (GPU?): 1.7.0 (True)
  • Tensorflow version (GPU?): 2.3.1 (True)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Information

Model I am using (Bert, XLNet …): Bert

The problem arises when using:

  • my own modified scripts: (give details below)

The tasks I am working on is:

  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Trained HuggingFace Transformers model BertForSequenceClassification on custom dataset with PyTorch backend.
  2. Used provided convert_graph_to_onnx.py script to convert model (from saved checkpoint) to ONNX format.
  3. Loaded the model with ONNXRuntime
  4. Instantiated BertTokenizer.from_pretrained(‘bert-based-uncased’) and fed in various input text to encode_plus method.
  5. Fed outputs of this to the ONNXRuntime session.

Expected behavior

The expected behavior is that the output of sess.run on the aforementioned inputs should output an array of dimension (1, 100) (corresponding to 100 classes) with each value between 0 and 1, with all entries summing to 1. We get the correct dimension, however, we get values between about -3.04 and 7.14 (unsure what these values refer to).

Hi @nsingh, without seeing your code it’s hard to know exactly what’s going wrong but based on this comment

We get the correct dimension, however, we get values between about -3.04 and 7.14 (unsure what these values refer to).

my guess is that you are getting the logits from the model instead of the predicted classes. I ran into this problem recently and the solution was to specify pipeline_name=sentiment-analysis to load the model for a TextClassificationPipeline:

from transformers.convert_graph_to_onnx import convert

model_ckpt = ...
tokenizer = ...
onnx_model_path = ...
convert(framework="pt", model=model_ckpt, tokenizer=tokenizer,
 output=onnx_model_path, opset=12, pipeline_name="sentiment-analysis")

By default, convert_graph_to_onnx uses the feature-extraction pipeline which might explain why you’re seeing negative numbers (i.e. the logits)

2 Likes

Hi @lewtun, I’m new to ONNX and having difficulties moving my pipeline into ONNXRuntime. Currently my workflow is like this:
config = AutoConfig.from_pretrained(path_finetuned)
model = AutoModelForSequenceClassification.from_pretrained(path_finetuned, config=config)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

I am able to convert my the model to ONNX by:
convert(framework="pt", model="distilbert-base-cased", output=Path("/onnx/fine-tuned.onnx"), pipeline_name="sentiment-analysis", opset=13)

However, I’m having difficulties moving the ONNX graph towards a similar pipeline classifier = TextClassificationPipeline(...). Could you share your approach?

Hey @heinz, there’s a notebook here that you can use to get started: transformers/04-onnx-export.ipynb at master · huggingface/transformers · GitHub

The main thing you need to do is create an ORT InferenceSession with e.g. the following function:

def create_model_for_provider(model_path: str, provider: str) -> InferenceSession: 
  
  assert provider in get_all_providers(), f"provider {provider} not found, {get_all_providers()}"

  # Few properties that might have an impact on performances (provided by MS)
  options = SessionOptions()
  options.intra_op_num_threads = 1
  options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

  # Load the model as a graph and prepare the CPU backend 
  session = InferenceSession(model_path, options, providers=[provider])
  session.disable_fallback()
    
  return session

Once you create a session, you’ll still need to tokenize and encode the inputs and you can find some additional examples in the ORT repo as well, e.g. onnxruntime/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb at master · microsoft/onnxruntime · GitHub

1 Like