Hi, I am working on a T5 Summarizer and would like to know what the output for trainer.predict.predictions refer to. Also, I saw that we would have to use argmax to get the generated summary but my results for predict.predictions returns a nested array. How do I know which array to use?
These are my codes:
# Train trainer
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer
model = T5ForConditionalGeneration.from_pretrained('t5-base')
output_dir = 'output2'
# fine-tune model using the transformers.Trainer API
training_args = Seq2SeqTrainingArguments(
trainer = Seq2SeqTrainer(
#Evaluate Trainer/ get summaries
pred_args = Seq2SeqTrainingArguments(
trainer = Seq2SeqTrainer(model=model, args=pred_args)
preds = prediction.predictions
labels = prediction.label_ids
preds returns a nested array
Thank you for your help!
Hi @paulynlhx , I am also curious about the generation from setting
predict_with_generate=True and having to argmax to get the generation. Did you observe any difference between the generation from using argmax when
predict_with_generate is True and when
predict_with_generate is False. Does
predict_with_generate=True give you better output than
Hi @TopRightExit , it seems that running the trainer without
predict_with_generate=True does not return any predictions or labels
Thank you @paulynlhx .
I was wondering about that bec I was using the
predict_with_generate does not work, and I was wondering if I should use
argmax to get the generation in my
compute_metrics function, hence my question to you. Turns out
SFTTrainer does not have
predict_with_generate, and there were no plans to support it … …
@TopRightExit I’m not too sure about the SFTTrainer but from my code and current understanding (not sure if its correct), my Seq2SeqTrainer.predict.predictions returns a nested array with 3 layers. The outer most layer is the prediction batch, middle layer contains the validation set (len of middle layer = number of input data), and the innermost layer is the tokens.
The output contains negative floats which may be logits. Hence, I used softmax to get the probability and then used argmax to get the most probable token index. I then decoded the index to get the generated summary.
# Generate tokenizer
tokenizer = AutoTokenizer.from_pretrained('t5-base')
preds = trainer.predict(val_dataset).predictions
p1 = preds # get first batch
generated = 
# Iterate through inputs
for input in p1:
# Iterate through sequence to get best token at each sequence number
best_token = 
for t in range(len(input)):
# softmax to change logit to probabilities
prob = np.exp(input[t]) / np.sum(np.exp(input[t]))
# argmax to get highest probability
best_index = np.argmax(prob)
decoded = [tokenizer.decode(x, skip_special_tokens=True) for x in generated]