Inference with Finetuned BERT Model converted to ONNX does not output probabilities

nsingh · March 1, 2021, 8:08pm

Environment info

transformers version: 3.5.1
Platform: Linux-4.14.203-116.332.amzn1.x86_64-x86_64-with-glibc2.10
Python version: 3.7.6
PyTorch version (GPU?): 1.7.0 (True)
Tensorflow version (GPU?): 2.3.1 (True)
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Information

Model I am using (Bert, XLNet …): Bert

The problem arises when using:

my own modified scripts: (give details below)

The tasks I am working on is:

my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Trained HuggingFace Transformers model BertForSequenceClassification on custom dataset with PyTorch backend.
Used provided convert_graph_to_onnx.py script to convert model (from saved checkpoint) to ONNX format.
Loaded the model with ONNXRuntime
Instantiated BertTokenizer.from_pretrained(‘bert-based-uncased’) and fed in various input text to encode_plus method.
Fed outputs of this to the ONNXRuntime session.

Expected behavior

The expected behavior is that the output of sess.run on the aforementioned inputs should output an array of dimension (1, 100) (corresponding to 100 classes) with each value between 0 and 1, with all entries summing to 1. We get the correct dimension, however, we get values between about -3.04 and 7.14 (unsure what these values refer to).

lewtun · March 1, 2021, 8:17pm

Hi @nsingh, without seeing your code it’s hard to know exactly what’s going wrong but based on this comment

We get the correct dimension, however, we get values between about -3.04 and 7.14 (unsure what these values refer to).

my guess is that you are getting the logits from the model instead of the predicted classes. I ran into this problem recently and the solution was to specify pipeline_name=sentiment-analysis to load the model for a TextClassificationPipeline:

from transformers.convert_graph_to_onnx import convert

model_ckpt = ...
tokenizer = ...
onnx_model_path = ...
convert(framework="pt", model=model_ckpt, tokenizer=tokenizer,
 output=onnx_model_path, opset=12, pipeline_name="sentiment-analysis")

By default, convert_graph_to_onnx uses the feature-extraction pipeline which might explain why you’re seeing negative numbers (i.e. the logits)

heinz · March 26, 2021, 3:47pm

Hi @lewtun, I’m new to ONNX and having difficulties moving my pipeline into ONNXRuntime. Currently my workflow is like this:
config = AutoConfig.from_pretrained(path_finetuned)
model = AutoModelForSequenceClassification.from_pretrained(path_finetuned, config=config)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

I am able to convert my the model to ONNX by:
convert(framework="pt", model="distilbert-base-cased", output=Path("/onnx/fine-tuned.onnx"), pipeline_name="sentiment-analysis", opset=13)

However, I’m having difficulties moving the ONNX graph towards a similar pipeline classifier = TextClassificationPipeline(...). Could you share your approach?

lewtun · March 26, 2021, 8:37pm

Hey @heinz, there’s a notebook here that you can use to get started: transformers/04-onnx-export.ipynb at master · huggingface/transformers · GitHub

The main thing you need to do is create an ORT InferenceSession with e.g. the following function:

def create_model_for_provider(model_path: str, provider: str) -> InferenceSession: 
  
  assert provider in get_all_providers(), f"provider {provider} not found, {get_all_providers()}"

  # Few properties that might have an impact on performances (provided by MS)
  options = SessionOptions()
  options.intra_op_num_threads = 1
  options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

  # Load the model as a graph and prepare the CPU backend 
  session = InferenceSession(model_path, options, providers=[provider])
  session.disable_fallback()
    
  return session

Once you create a session, you’ll still need to tokenize and encode the inputs and you can find some additional examples in the ORT repo as well, e.g. onnxruntime/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb at master · microsoft/onnxruntime · GitHub

Topic		Replies	Views
Onnx tf bert sentiment-analysis input and outputs 🤗Transformers	3	1103	July 5, 2022
How load a Bert model from Onnx Runtime? 🤗Transformers	0	2279	July 14, 2021
ONNX Conversion - transformers.onnx vs convert_graph_to_onnx.py Beginners	5	1651	January 26, 2023
How to get probabilities per label in finetuning classification task? Beginners	5	5508	February 18, 2022
Convert bert tokenizer to onnx Beginners	3	3190	November 4, 2022

Inference with Finetuned BERT Model converted to ONNX does not output probabilities

Environment info

Information

To reproduce

Expected behavior

Related topics