What does EvalPrediction.predictions contain exactly?

berkayberabi · October 22, 2020, 11:35am

I want to implement a function for computing metrics and pass it to the Trainer. In the doc, EvalPrediction has 2 attributes: predictions and label_ids. It is written that both of them are of type ndarray but this is not the case for me.

The label_ids have is correct. It is ndarray and has shape (4, seqlen) where 4 is the number of samples in my validation. However, the attribute predictions are a tuple?

At index 0, I have an array of sizes (3, 4, 56, 32104). 4 again is the number of samples 56 is the sequence length and 32104 is the vocabulary size but what is the 3 then?

At index 1 I have first a tuple/list of tuples with size 4,6 and then an array of 4, 8, 56, 64

And at index 2 I have an array of size 4, 78, 512.

What are all these arrays actually? I think this should be clarified in the documentation.

Thanks for your help!

sgugger · October 22, 2020, 1:07pm

The Trainer will put in predictions everything your model returns (apart from the loss). So if you get multiple arrays, it’s likely because your model returns multiple things. No one can help you determine what they are without seeing your model (which is why you should always post the code you’re using when asking for help )

berkayberabi · October 22, 2020, 1:29pm

okay I understand but I am just using T5 model form the library so it is not like my own model or so. I can post the code anyways.

import torch
import argparse
import os
import sys
import numpy as np
import torch.nn.functional as F
sys.path.append('..')
from transformers import T5ForConditionalGeneration, Trainer, TrainingArguments
from data_reader import GetDataAsPython
from sklearn.model_selection import train_test_split
from prepare_data import create_data, create_dataset
from transformers import T5Tokenizer

parser = argparse.ArgumentParser()
parser.add_argument('-e', '--epochs', type=int, default=100)
parser.add_argument('-bs', '--batch-size', type=int, default=1)
parser.add_argument('-lr', '--learning-rate', type=float, default=1e-4)
parser.add_argument('-gcv', '--gradient-clip-val', type=float, default=0.0)
parser.add_argument('-wd', '--weight-decay', type=float, default=0.01)
args = parser.parse_args()

# delete the logs directory
model_name = "t5"
os.system("rm -rf ./logs" + model_name)
os.system("rm -rf ./results_" + model_name)

data = GetDataAsPython('../data_large.json')

train_inputs, train_labels, val_inputs, val_labels, test_inputs, test_labels = create_data(data, ['no-array-constructor'])

# from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-small')
print('len of tokenizer before adding: ', len(tokenizer))
tokenizer.add_tokens(['{', '}', '<', '>'])
train_dataset = create_dataset(train_inputs, train_labels, tokenizer, True)
val_dataset = create_dataset(val_inputs, val_labels, tokenizer, False)
test_dataset = create_dataset(test_inputs, test_labels, tokenizer, False)


def compute_val_metrics(eval_predictions):
    # print('\n')
    # print(len(eval_predictions.predictions[1]))
    # print(len(eval_predictions.predictions[1][0]))
    # print(eval_predictions.predictions[1][0][0].shape)

    return metrics

training_args = TrainingArguments(
    output_dir='./results_' + model_name,          
    num_train_epochs=args.epochs,              
    per_device_train_batch_size=args.batch_size,  
    per_device_eval_batch_size=4,   
    warmup_steps=500,                
    weight_decay=args.weight_decay,               
    logging_dir='./logs_' + model_name,
    logging_steps=10,
    do_eval=True,
    evaluation_strategy='epoch',
    learning_rate=args.learning_rate,
    load_best_model_at_end=True,
    metric_for_best_model='eval_loss',
    greater_is_better=False,
    # prediction_loss_only=True
)

model = T5ForConditionalGeneration.from_pretrained('t5-small', return_dict=True)
model.resize_token_embeddings(len(tokenizer))
# model.resize maybe depending on tokens

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=[torch.optim.Adam(params=model.parameters(), lr=args.learning_rate), None],       
    tokenizer=tokenizer,
    compute_metrics=compute_val_metrics
)

trainer.train()

valhalla · October 24, 2020, 8:10am

3 is
['logits', 'past_key_values', 'encoder_last_hidden_state']. This is what the seq2seq models return. logits is what you’ll need for computing metrics which is of the shape (bs, seq_len, vocab_size).

Also, for training seq2seq models consider using Seq2SeqTrainer, which supports generation during evaluation to be able to calculate generative metrics like blue, rouge etc.

Check finetune_trainer.py to see how to use Seq2SeqTrainer

berkayberabi · October 24, 2020, 3:52pm

Thank you very much ! Now it is clear. I think that this information should also be added to the documentation. I will also give a look at Seq2SeqTrainer.

Thanks!

sgugger · October 26, 2020, 1:21pm

I’m unsure why you think the information is not in the documentation. At the T5 model under the returns section, I can see everything. Where do you think it’s missing?

berkayberabi · October 28, 2020, 3:56pm

Yes you are right. I did not pay attention to that. However, models return a lot of things. Do you think that it would be easier to return a dictionary? In the format: {“encoder_outputs”: Tensor, “logits”: Tensor …}

sgugger · October 29, 2020, 2:59pm

The models by themselves return that when you pass return_dict=True (which will soon become the default). We can definitely add some code to carry on that type of outputs when the option is selected to the predictions.

hudsonmendes · August 3, 2023, 11:34am

We can definitely add some code to carry on that type of outputs when the option is selected to the predictions.

Am I correct to understand this comment as: “it would be useful to return a dict when the forward outputs with return_dict=True”?

The code that builds logics which are ultimately fed into the EvalPrediction.predictions attribute is the following:

if isinstance(outputs, dict):
    logits = tuple(v for k, v in outputs.items() if k not in ignore_keys + ["loss"])

So whatever is returned by the dict.items() provides the order of components in the EvalPrediction.predictions.

However, relying on order like this increases the cognitive load required to work. Additionally, it’s more error prone, and can cause all sorts of problems without prior warning, specially if the output is created using a Mapping type that does not provide reliable ordering.

By setting EvalPrediction.predictions to be exactly what the user provided as output to the forward pass can simplify things considerably.

Would that be a correct assessment?

Topic		Replies	Views
Trainer class, compute_metrics and EvalPrediction 🤗Transformers	6	14492	October 28, 2020
EvalPrediction returning one less prediction than label id for each batch Beginners	7	6102	June 19, 2024
EvalPrediction has an unequal number of label_ids and predictions 😫 🤗Transformers	3	1309	June 19, 2024
AttributeError: 'EvalPrediction' object has no attribute 'prediction' // Training MiniLM Beginners	0	551	June 5, 2023
Using Trainer class with T5 - what is returned in EvalPrediction dict? 🤗Transformers	8	5304	February 14, 2022

What does EvalPrediction.predictions contain exactly?

Related topics