Onnx tf bert sentiment-analysis input and outputs

I trained a model based on bert-large-cased using the run_text_classification.py tensorflow example. I converted the model to onnx using the convert_graph_to_onnx.py tool.

That looks like this:

convert_graph_to_onnx.convert(
        framework='tf',
        model="output",
        output=Path("model/model.onnx"),
        opset=12,
        tokenizer="output",
        use_external_format=True,
        pipeline_name="sentiment-analysis")

I’m having a slight difficulty understanding the inputs and outputs. I am trying to use Rust and the onnxruntime. The onnx model input shape is (?, 5). I am using input_ids, attention mask, token ids constructed from the tokenizers Rust library. For example

the inputs I’m using

[[101, 1, 0, 0, 0],
 [1104, 1, 0, 0, 0],
 [23120, 1, 0, 0, 0],
 [188, 1, 0, 0, 0],
 [19091, 1, 0, 0, 0],
 [8124, 1, 0, 0, 0],
 [1111, 1, 0, 0, 0],
 [3062, 1, 0, 0, 0],
 [1105, 1, 0, 0, 0],
 [15470, 1, 0, 0, 0],
 [119, 1, 0, 0, 0],
 [102, 1, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
...
...

I truncate + pad the tokens out to 128 length, since I used a max length of 128 when training the model.

the outputs are logits (I think) in the shape (?, 4). My sentiment analysis task has 4 classes, so I think this makes sense.

[[0.87741166, 0.4000733, -2.557633, 1.5771139],
 [-0.7227318, -0.14528184, 2.4809465, -1.7312673],
 [0.88585603, 0.392128, -2.54713, 1.5852065],
 [-0.89909637, 0.5074229, 2.3639672, -1.8381689],
 [0.8940967, 0.40258378, -2.5756738, 1.5999701],
...
...

Since this is a sentiment-analysis task why are there logits for each token?

Any advice on things to check? Any things stand out as obviously wrong?

Usually, the last layer of a classification model (in your case TFBertForSequenceClassification), produces raw prediction values as real numbers ranging from [-infinity, +infinity] These raw, unconstrained prediction values are commonly known as logits

Now after this, we usually want to normalize these outputs to a probability distribution over predicted output classes. To do this, we usually use a normalization layer such as sigmoid and softmax for binary and multi-class classification respectively.
The output of your softmax function is now a probability and to convert that into your classes, we just take the max of these.

To do that with your onnx outputs, you can do something like this:

import numpy as np
from scipy.special import softmax

np.argmax(softmax(outputs[0][0], axis=0))
1 Like

Thanks! This moved me in the right direction.

This helped a lot, thanks.