Hi, I am working on a T5 Summarizer and would like to know what the output for trainer.predict.predictions refer to. Also, I saw that we would have to use argmax to get the generated summary but my results for predict.predictions returns a nested array. How do I know which array to use?

These are my codes:

```
# Train trainer
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer
model = T5ForConditionalGeneration.from_pretrained('t5-base')
output_dir = 'output2'
# fine-tune model using the transformers.Trainer API
training_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
num_train_epochs=6,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
eval_accumulation_steps=1,
prediction_loss_only=True,
learning_rate=4e-5,
evaluation_strategy='steps',
save_steps=1000,
save_total_limit=1,
eval_steps=1000,
load_best_model_at_end=True,
metric_for_best_model="rouge1",
predict_with_generate=True,
push_to_hub=False,
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset
)
trainer.train()
```

```
#Evaluate Trainer/ get summaries
pred_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
per_device_eval_batch_size=8,
eval_accumulation_steps=1
)
trainer = Seq2SeqTrainer(model=model, args=pred_args)
prediction= trainer.predict(val_dataset)
preds = prediction.predictions
labels = prediction.label_ids
```

preds returns a nested array

Thank you for your help!

Hi @paulynlhx , I am also curious about the generation from setting `predict_with_generate=True`

and having to argmax to get the generation. Did you observe any difference between the generation from using argmax when `predict_with_generate`

is True and when `predict_with_generate`

is False. Does `predict_with_generate=True`

give you better output than `predict_with_generate=False`

?

Hi @TopRightExit , it seems that running the trainer without `predict_with_generate=True`

does not return any predictions or labels

1 Like

Thank you @paulynlhx .

I was wondering about that bec I was using the `SFTTrainer`

from `trl`

and `predict_with_generate`

does not work, and I was wondering if I should use `argmax`

to get the generation in my `compute_metrics`

function, hence my question to you. Turns out `SFTTrainer`

does not have `predict_with_generate`

, and there were no plans to support it ā¦ ā¦

@TopRightExit Iām not too sure about the SFTTrainer but from my code and current understanding (not sure if its correct), my Seq2SeqTrainer.predict.predictions returns a nested array with 3 layers. The outer most layer is the prediction batch, middle layer contains the validation set (len of middle layer = number of input data), and the innermost layer is the tokens.

The output contains negative floats which may be logits. Hence, I used softmax to get the probability and then used argmax to get the most probable token index. I then decoded the index to get the generated summary.

```
# Generate tokenizer
tokenizer = AutoTokenizer.from_pretrained('t5-base')
preds = trainer.predict(val_dataset).predictions
p1 = preds[0] # get first batch
generated = []
# Iterate through inputs
for input in p1:
# Iterate through sequence to get best token at each sequence number
best_token = []
for t in range(len(input)):
# softmax to change logit to probabilities
prob = np.exp(input[t]) / np.sum(np.exp(input[t]))
# argmax to get highest probability
best_index = np.argmax(prob)
best_token.append(best_index)
generated.append(best_token)
decoded = [tokenizer.decode(x, skip_special_tokens=True) for x in generated]
```