What does the output of Seq2SeqTrainer predict.predictions refer to and how to get generated summaries

Hi, I am working on a T5 Summarizer and would like to know what the output for trainer.predict.predictions refer to. Also, I saw that we would have to use argmax to get the generated summary but my results for predict.predictions returns a nested array. How do I know which array to use?

These are my codes:

# Train trainer
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = T5ForConditionalGeneration.from_pretrained('t5-base')

output_dir = 'output2'

# fine-tune model using the transformers.Trainer API
training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
    num_train_epochs=6,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    eval_accumulation_steps=1, 
    prediction_loss_only=True, 
    learning_rate=4e-5,
    evaluation_strategy='steps', 
    save_steps=1000,
    save_total_limit=1, 
    eval_steps=1000, 
    load_best_model_at_end=True,
    metric_for_best_model="rouge1", 
    predict_with_generate=True,
    push_to_hub=False,
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset
)

trainer.train()
#Evaluate Trainer/ get summaries
pred_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
    per_device_eval_batch_size=8,
    eval_accumulation_steps=1
)

trainer = Seq2SeqTrainer(model=model, args=pred_args)

prediction= trainer.predict(val_dataset)
preds = prediction.predictions
labels = prediction.label_ids

preds returns a nested array

Thank you for your help!

Hi @paulynlhx , I am also curious about the generation from setting predict_with_generate=True and having to argmax to get the generation. Did you observe any difference between the generation from using argmax when predict_with_generate is True and when predict_with_generate is False. Does predict_with_generate=True give you better output than predict_with_generate=False?

Hi @TopRightExit , it seems that running the trainer without predict_with_generate=True does not return any predictions or labels

1 Like

Thank you @paulynlhx .

I was wondering about that bec I was using the SFTTrainer from trl and predict_with_generate does not work, and I was wondering if I should use argmax to get the generation in my compute_metrics function, hence my question to you. Turns out SFTTrainer does not have predict_with_generate, and there were no plans to support it ā€¦ ā€¦

@TopRightExit Iā€™m not too sure about the SFTTrainer but from my code and current understanding (not sure if its correct), my Seq2SeqTrainer.predict.predictions returns a nested array with 3 layers. The outer most layer is the prediction batch, middle layer contains the validation set (len of middle layer = number of input data), and the innermost layer is the tokens.
The output contains negative floats which may be logits. Hence, I used softmax to get the probability and then used argmax to get the most probable token index. I then decoded the index to get the generated summary.

# Generate tokenizer
tokenizer = AutoTokenizer.from_pretrained('t5-base')

preds = trainer.predict(val_dataset).predictions
p1 = preds[0] # get first batch
generated = []
# Iterate through inputs
for input in p1:
    # Iterate through sequence to get best token at each sequence number
    best_token = []
    for t in range(len(input)):
        # softmax to change logit to probabilities
        prob = np.exp(input[t]) / np.sum(np.exp(input[t]))
        # argmax to get highest probability
        best_index = np.argmax(prob)
        best_token.append(best_index)
    generated.append(best_token)

decoded = [tokenizer.decode(x, skip_special_tokens=True) for x in generated]