Hi, I am working on a T5 Summarizer and would like to know what the output for trainer.predict.predictions refer to. Also, I saw that we would have to use argmax to get the generated summary but my results for predict.predictions returns a nested array. How do I know which array to use?
These are my codes:
# Train trainer
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer
model = T5ForConditionalGeneration.from_pretrained('t5-base')
output_dir = 'output2'
# fine-tune model using the transformers.Trainer API
training_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
num_train_epochs=6,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
eval_accumulation_steps=1,
prediction_loss_only=True,
learning_rate=4e-5,
evaluation_strategy='steps',
save_steps=1000,
save_total_limit=1,
eval_steps=1000,
load_best_model_at_end=True,
metric_for_best_model="rouge1",
predict_with_generate=True,
push_to_hub=False,
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset
)
trainer.train()
#Evaluate Trainer/ get summaries
pred_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
per_device_eval_batch_size=8,
eval_accumulation_steps=1
)
trainer = Seq2SeqTrainer(model=model, args=pred_args)
prediction= trainer.predict(val_dataset)
preds = prediction.predictions
labels = prediction.label_ids
preds returns a nested array
Thank you for your help!
Hi @paulynlhx , I am also curious about the generation from setting predict_with_generate=True
and having to argmax to get the generation. Did you observe any difference between the generation from using argmax when predict_with_generate
is True and when predict_with_generate
is False. Does predict_with_generate=True
give you better output than predict_with_generate=False
?
Hi @TopRightExit , it seems that running the trainer without predict_with_generate=True
does not return any predictions or labels
1 Like
Thank you @paulynlhx .
I was wondering about that bec I was using the SFTTrainer
from trl
and predict_with_generate
does not work, and I was wondering if I should use argmax
to get the generation in my compute_metrics
function, hence my question to you. Turns out SFTTrainer
does not have predict_with_generate
, and there were no plans to support it … …
@TopRightExit I’m not too sure about the SFTTrainer but from my code and current understanding (not sure if its correct), my Seq2SeqTrainer.predict.predictions returns a nested array with 3 layers. The outer most layer is the prediction batch, middle layer contains the validation set (len of middle layer = number of input data), and the innermost layer is the tokens.
The output contains negative floats which may be logits. Hence, I used softmax to get the probability and then used argmax to get the most probable token index. I then decoded the index to get the generated summary.
# Generate tokenizer
tokenizer = AutoTokenizer.from_pretrained('t5-base')
preds = trainer.predict(val_dataset).predictions
p1 = preds[0] # get first batch
generated = []
# Iterate through inputs
for input in p1:
# Iterate through sequence to get best token at each sequence number
best_token = []
for t in range(len(input)):
# softmax to change logit to probabilities
prob = np.exp(input[t]) / np.sum(np.exp(input[t]))
# argmax to get highest probability
best_index = np.argmax(prob)
best_token.append(best_index)
generated.append(best_token)
decoded = [tokenizer.decode(x, skip_special_tokens=True) for x in generated]