I have successfully fine tuned “TF DistilBert For Sequence” Classification to distinguish comments that are toxic vs. not in my datasets. Is there a way to use the same model to gauge which sentence in a pair of toxic sentences is more (or less) toxic? Is there a way to access the probability produced by the classifier to compare toxicity of two toxic sentences?
Hi,
You can access the probability as follows:
from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
import tensorflow as tf
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
outputs = model(inputs)
probabilities = tf.math.softmax(outputs.logits, axis=-1)
print(probabilities)
The probabilities are a tensor of shape (batch_size, num_labels), containing the probabilities per class for every example in the batch.