Combine multiple metrics in compute_metrics for validation

DjTobalito · June 3, 2024, 2:07pm

Hi
I am trying to get precision / recall / accuracy scores computed during the evaluation of the model during training. My compute_metrics() function is as follow:

metric1 = evaluate.load("accuracy")
metric2 = evaluate.load("precision")
metric3 = evaluate.load("recall")
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    #
    accuracy =  metric1.compute(predictions=predictions, references=labels)["accuracy"]
    precision = metric2.compute(predictions=predictions, references=labels,
                                average="micro")["precision"]
    recall = metric3.compute(predictions=predictions, references=labels,
                                average="micro")["recall"]
    #
    return {"precision": precision, "recall": recall, "accuracy": accuracy}

The TrainingArguments and Trainer are as follows :

training_args = TrainingArguments(
    output_dir=str(local_model_pth / model_name),
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="steps",
    eval_steps=200,
    save_steps=20000,
    save_strategy="steps",
    load_best_model_at_end=True,
    push_to_hub=False,
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

Every thing seems to run fine, but the output I get when the model is training is as follows :

The precision / recall / accuracy have the same value . I have been through the help here, and Internet / HuggingChat without any working solution.
Any idea ?

DjTobalito · June 4, 2024, 7:47am

The average="micro" seems to be a no go (which makes sense, which of the N classes would you display). Discovered also evaluate.combine in the Evalute library docs
The implementea

metric = evaluate.combine(["precision", "recall"])
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    #
    return metric.compute(predictions=predictions, references=labels, average="macro")

This implementation will not allow to combine accuracy with precision/recall in the case of a multi-class use case. Indeed, if you combine accuracy with precision/recall, you cannot use average as an argument in metric.compute (accuracy does not expect it). And if you leave it as default, precision/recall is set to binary as default.

Topic		Replies	Views
Log multiple metrics while training 🤗Datasets	5	11009	March 15, 2022
How to add multiple metrics to Huggingface Transformers Trainer? 🤗Transformers	1	2072	July 26, 2022
Getting the same value for all evaluation metrics Models	1	110	July 21, 2024
Calculate precision, recall, f1 score for custom dataset for multiclass classification Beginners	13	8739	June 13, 2024
Difference between using or not compute_metric Beginners	2	873	November 27, 2023

Combine multiple metrics in compute_metrics for validation

Related topics