Predictions format sent to compute_metrics depends on model used

pvelosipednikov · December 4, 2023, 8:17pm

I am finetuning 2 models with the same workflow. In writing a custom compute_metrics function, I noticed that the format of the prediction object sent to compute_metrics depends on the model. I want to understand where in the documentation of the respective models I would be able to understand that format. As is, I’ve had to play around with the function when I got errors in order to make it work.

First, a simple example to show that the format when generating tokens and predictions outside of train is the same for both models.

Prediction format outside of finetuning

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint_deci = "Deci/DeciCoder-1b"
checkpoint_sf = "Salesforce/codegen-350M-mono"
device = 'cpu'

deci = AutoModelForCausalLM.from_pretrained(checkpoint_deci,
                                            trust_remote_code=True).to(device)
tok_deci = AutoTokenizer.from_pretrained(checkpoint_deci)


sf = AutoModelForCausalLM.from_pretrained(checkpoint_sf,
                                          trust_remote_code=True).to(device)
tok_sf = AutoTokenizer.from_pretrained(checkpoint_sf)

input_ids = tok_deci("Ok what's going to come next here",
                     return_tensors="pt").to(device)
print(f'Input ids contents: {input_ids.keys()}')
deci.generate(input_ids = input_ids['input_ids'])

Returns:


tensor([[7558, 2769, 1182, 6783,  372, 6539, 2354, 2442,   49,  203,  203,   21,
          701,   77, 1481,   44,  478,  203,   21,  701]])

input_ids_deci = tok_sf("Ok what's going to come next here",
                     return_tensors="pt").to(device)
print(f'Input ids contents: {input_ids_deci.keys()}')
sf.generate(input_ids_deci['input_ids'])

Returns:

tensor([[18690,   644,   338,  1016,   284,  1282,  1306,   994,    30,   198,
         50280,     2,   198, 50280,     2, 50283,     2, 16926,    46,    25]])

So the format is identical across the models.

Format sent to compute metrics
Now consider, the following:

def compute_bleu_score(pred, tokenizer):
    logits = pred.predictions
    preds_tok = np.argmax(
        logits, axis=2)
    decode_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decode_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    res = bleu.compute(predictions = decode_predictions, references=decode_labels)
    return {"bleu_score": res['bleu']}

bleu = evaluate.load("bleu")

# Data collator - Assembles data into batches for training
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

training_args = TrainingArguments(output_dir='../trainer',
                                  gradient_checkpointing=True,
                                  evaluation_strategy="epoch",
                                  num_train_epochs=3)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    data_collator=data_collator,
    compute_metrics=compute_bleu_score,
    tokenizer=tokenizer
)

trainer.train()

Now, for Decicoder, when the trainer sends data to compute_bleu_score, logits will be a 3-dimensional numpy array. However, for SalesForce Codegen, logits will be a tuple. So I have to modify this function a bit to account for that. Not difficult but I want to understand:

How do I identify this format ahead of time (i.e in the model docs)?
What exactly is the other second element of the pred.predictions tuple in Codegen?

Topic		Replies	Views
Trainer class, compute_metrics and EvalPrediction 🤗Transformers	6	14542	October 28, 2020
Trainer predict or evaluate returns zero for metrics 🤗Transformers	0	57	July 11, 2024
Problem with custom metric for custom T5 model Beginners	1	762	October 9, 2023
Eval_pred vs. EvalPrediction confusion 🤗Transformers	0	872	August 5, 2023
Custom model for Trainer 🤗Transformers	1	385	July 8, 2023

Predictions format sent to compute_metrics depends on model used

Related topics