Question regarding TF DistilBert For Sequence Classification

I have successfully fine tuned “TF DistilBert For Sequence” Classification to distinguish comments that are toxic vs. not in my datasets. Is there a way to use the same model to gauge which sentence in a pair of toxic sentences is more (or less) toxic? Is there a way to access the probability produced by the classifier to compare toxicity of two toxic sentences?


You can access the probability as follows:

from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
outputs = model(inputs)

probabilities = tf.math.softmax(outputs.logits, axis=-1)

The probabilities are a tensor of shape (batch_size, num_labels), containing the probabilities per class for every example in the batch.