Predictions format sent to compute_metrics depends on model used

I am finetuning 2 models with the same workflow. In writing a custom compute_metrics function, I noticed that the format of the prediction object sent to compute_metrics depends on the model. I want to understand where in the documentation of the respective models I would be able to understand that format. As is, I’ve had to play around with the function when I got errors in order to make it work.

First, a simple example to show that the format when generating tokens and predictions outside of train is the same for both models.

Prediction format outside of finetuning

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint_deci = "Deci/DeciCoder-1b"
checkpoint_sf = "Salesforce/codegen-350M-mono"
device = 'cpu'

deci = AutoModelForCausalLM.from_pretrained(checkpoint_deci,
                                            trust_remote_code=True).to(device)
tok_deci = AutoTokenizer.from_pretrained(checkpoint_deci)


sf = AutoModelForCausalLM.from_pretrained(checkpoint_sf,
                                          trust_remote_code=True).to(device)
tok_sf = AutoTokenizer.from_pretrained(checkpoint_sf)
input_ids = tok_deci("Ok what's going to come next here",
                     return_tensors="pt").to(device)
print(f'Input ids contents: {input_ids.keys()}')
deci.generate(input_ids = input_ids['input_ids'])

Returns:


tensor([[7558, 2769, 1182, 6783,  372, 6539, 2354, 2442,   49,  203,  203,   21,
          701,   77, 1481,   44,  478,  203,   21,  701]])
input_ids_deci = tok_sf("Ok what's going to come next here",
                     return_tensors="pt").to(device)
print(f'Input ids contents: {input_ids_deci.keys()}')
sf.generate(input_ids_deci['input_ids'])

Returns:

tensor([[18690,   644,   338,  1016,   284,  1282,  1306,   994,    30,   198,
         50280,     2,   198, 50280,     2, 50283,     2, 16926,    46,    25]])

So the format is identical across the models.

Format sent to compute metrics
Now consider, the following:

def compute_bleu_score(pred, tokenizer):
    logits = pred.predictions
    preds_tok = np.argmax(
        logits, axis=2)
    decode_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decode_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    res = bleu.compute(predictions = decode_predictions, references=decode_labels)
    return {"bleu_score": res['bleu']}

bleu = evaluate.load("bleu")

# Data collator - Assembles data into batches for training
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

training_args = TrainingArguments(output_dir='../trainer',
                                  gradient_checkpointing=True,
                                  evaluation_strategy="epoch",
                                  num_train_epochs=3)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    data_collator=data_collator,
    compute_metrics=compute_bleu_score,
    tokenizer=tokenizer
)

trainer.train()

Now, for Decicoder, when the trainer sends data to compute_bleu_score, logits will be a 3-dimensional numpy array. However, for SalesForce Codegen, logits will be a tuple. So I have to modify this function a bit to account for that. Not difficult but I want to understand:

  1. How do I identify this format ahead of time (i.e in the model docs)?
  2. What exactly is the other second element of the pred.predictions tuple in Codegen?