My friend who wishes to remain anonymous asked a good question about T5 that I couldn’t answer:
Say we have a model that predicts sentiment – answers are “positive/negative/neutral” – for something like RoBERTa we’d add a layer, slap on a softmax – and we get both argmax predictions, and some notion of probability amongst the three classes (as well as entropy).
For T5, we just get a text reply.
But of course if we looked at the outputs to the softmax in T5, we’d see p(“positive”) etc – assuming the response is 1 token.
Has anyone tried to do this already or seen examples/notebooks like this?
Ideally, we want to ask our model several questions. Without worrying too much about the conditional logic, we’d like to be able to measure the probability of text outputs, including some rare categories (that nonetheless are present in our training set). As well as to look for low and high entropy predictions.
If nobody has done this, any code pointers where to look would be helpful.
3 Likes
I’m not sure I understand what is the difference between what you are describing and how the GLUE dataset is handled in the T5 paper
I’m also not sure what the question means here, are you trying to ask if someone has used T5 for classification ?Then yes, I’ve fine-tuned it for both binary and multi-class classification here.
As for measuring the probability, this paper used T5 in really interesting way for document ranking. First they train the model to predict true
if doc is related to the query and false
if not. And for ranking they apply a softmax only on the logits of the “true” and “false” token and rerank using the probabilities assigned to the “true” token.
.
2 Likes
To answer first part of your question, Yes, I have tried T5 for multi class classification. It generates the tokens based on the class types(It could be a single token or multiple tokens based on the tokenization of the class label)
Second part of the question is not clear to me. Please explain more.
Got a more specific version of the question:
How should I fix this code to use t5 without finetuning for sentiment:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
input_sent = [' sentiment review: I love chocolate', ' sentiment review: I love chocolate']
labels = ['positive ', 'negative ']
input_ids = tokenizer(input_sent, return_tensors='pt').to(torch_device).input_ids
target_ids = tokenizer(labels, return_tensors='pt').to(torch_device).input_ids
#target_ids = torch.tensor([target_ids])
outputs = model(input_ids=input_ids, labels=target_ids, use_cache=False, return_dict=True)
I got the sentiment review prefix from https://github.com/google-research/text-to-text-transfer-transformer/issues/109 @valhalla
T5 was trained on sst2
as part of it’s multi-task pre-training mixture, so to use T5 for sentiment without fine-tuning use the prefix sst2 sentence:
and pass it to the model. You can do it two ways
from transformers import T5ForConditionalGeneration, T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
text = "sst2 sentence: it confirms fincher ’s status as a film maker who artfully bends technical know-how to the service of psychological insight"
enc = tokenizer(text, return_tensors="pt")
tokens = model.generate(**enc)
tokenizer.batch_decode(tokens)
=> ['positive']
or use the text2text-generation
pipeline
t5_sentiment = pipeline("text2text-generation")
text = "sst2 sentence: it confirms fincher ’s status as a film maker who artfully bends technical know-how to the service of psychological insight"
t5_sentiment(text)
=> [{'generated_text': 'positive'}]
That’s cool suraj, thanks! Is it possible to do that with forward?
forward as in without using generate
?
As T5 is trained using text-2-text approach we need to generate the output as text either manually calling forward
or using generate
. If we wish to do this as discriminative task we could take the same approach as BART
where we feed the same text to both encoder
and decoder
, pool the hidden states of the final eos
token and pass that to a classification head, this is how BartForSequecneClassfication
works. Not sure how this will work for T5, haven’t tried myself.
To answer the original question, you could use forward as shown below to generate the output
import torch
import torch.nn.functional as F
from transformers import T5ForConditionalGeneration, T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
model.eval()
text = "sst2 sentence: it confirms fincher ’s status as a film maker who artfully bends technical know-how to the service of psychological insight"
with torch.no_grad():
enc = tokenizer(text, return_tensors="pt")
decoder_input_ids = torch.tensor([tokenizer.pad_token_id]).unsqueeze(0)
logits = model(**enc, decoder_input_ids=decoder_input_ids)[0]
tokens = torch.argmax(logits, dim=2)
sentiments = tokenizer.batch_decode(tokens)
# 'positve'
Now if we wish to measure the probabilities, as I described in the earlier comment, we could only take the logits
of positive
and negative
token and apply softmax
on it. Thankfully T5 encodes positive
and negative
as single tokens so it’s easy to do. The token id for positive is 1465 and for negative 2841.
logits = logits.squeeze(1)
# only take the logits of positive and negative
selected_logits = logits[:, [1465, 2841]]
probs = F.softmax(selected_logits, dim=1)
#=> tensor([[0.9820, 0.0180]])
Hope this answers your question.
cc @sshleifer
3 Likes
Incredible answer, thanks a ton!
1 Like
Sorry for the topic steal, wasn’t getting a lot of attention on my topic on T5 ( Yet another question about T5 prefixes: are they special? - Models - Hugging Face Forums).
Has anyone here used T5 for regression? From the paper that @valhalla links, it seems that you could rebase your continuous labels to 0-1 and then use the output of the softmax for one of two options (e.g. true and false) in a MSELoss function. Or is that a nonsense suggestion? The caveat is that it is not always possible to rebase your values to 0-1.
Thanks ! This helps a great Deal . I wanted to know how to make use of this if the token_id of the word is segregated into 2 or more parts . like for entailment , the token_id in t5 model is [35,5756,297] .
How can i fit this in selected_logits = logits[:, [1465, 2841]] .
I used using array of array’s . Am I missing any tweak?