Get multiple metrics when using the huggingface trainer

hawoihgawjlj · March 17, 2022, 5:53pm

Hi all, I’d like to ask if there is any way to get multiple metrics during fine-tuning a model. Now I’m training a model for performing the GLUE-STS task, so I’ve been trying to get the pearsonr and f1score as the evaluation metrics. I referred to the link (Log multiple metrics while training) in order to achieve it, but in the middle of the second training epoch, it gave me the following error:

Trainer is attempting to log a value of "{'pearsonr': 0.8609849499038021}" of type <class 'dict'> for key "eval/pearsonr" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'f1': 0.8307692307692308}" of type <class 'dict'> for key "eval/f1" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-3435b262f1ae> in <module>()
----> 1 trainer.train()

2 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in _save_checkpoint(self, model, trial, metrics)
   1724                 self.state.best_metric is None
   1725                 or self.state.best_model_checkpoint is None
-> 1726                 or operator(metric_value, self.state.best_metric)
   1727             ):
   1728                 self.state.best_metric = metric_value

TypeError: '>' not supported between instances of 'dict' and 'dict'

And this is my compute_metrics code snippet:

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions[:, 0]
    binary_predictions = [1.0 if prediction >= 3.0 else 0.0 for prediction in predictions]
    binary_labels = [1.0 if label >= 3.0 else 0.0 for label in labels]
    pr = metric_pearsonr.compute(predictions=predictions, references=labels)
    f1 = metric_f1.compute(predictions=binary_predictions, references=binary_labels)

    return {"pearsonr": pr, "f1": f1}

It works fine if I only use one of the metrics like return pr or return f1. Does anyone have suggestions about this issue? I’d really appreciate it.

marshmellow77 · March 17, 2022, 6:02pm

The first line in your error message indicates that it expects a scalar instead of a dictionary (Trainer is attempting to log a value of "{'pearsonr': 0.8609849499038021}" of type <class 'dict'> for key "eval/pearsonr" as a scalar.).

Can you share why you want to return the values in a dictionary and not as values (i.e. why not use return pr, f1)?

hawoihgawjlj · March 17, 2022, 6:15pm

Hi Thanks for the reply. There’s no certain reason that I used a dictionary because I just followed the way in this discussion (Log multiple metrics while training). I also tried return pr, f1 as you suggested but it showed me another error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-3435b262f1ae> in <module>()
----> 1 trainer.train()

3 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   2511 
   2512         if all_losses is not None:
-> 2513             metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
   2514 
   2515         # Prefix all keys with metric_key_prefix + '_'

TypeError: 'tuple' object does not support item assignment

lucadini · March 18, 2022, 10:47am

I don’t know if this can help you in some way. I have implemented more metrics but not using the trainer by modifying the evaluation cycle like this:

accuracy = load_metric("accuracy")
precision = load_metric("precision")
recall = load_metric("recall")
f1 = load_metric("f1")

metrics = [accuracy, precision, recall, f1]

model.eval()
for step, batch in enumerate(eval_dataloader):
    outputs = model(**batch)
    predictions = outputs.logits.argmax(dim=-1) if not is_regression else outputs.logits.squeeze()
    for metric in metrics:
        metric.add_batch(
            predictions=accelerator.gather(predictions),
            references=accelerator.gather(batch["labels"]),
         )

logger.info(f"epoch {epoch+1}: train loss {loss}")
for metric in metrics:
    if metric.name == "accuracy":
        eval_metric = metric.compute()
        logger.info(f"{eval_metric}")
    else:
        eval_metric = metric.compute(average=None)
        logger.info(f"{eval_metric}")
    if metric.name == "f1":
        avg_f1 = sum(values)/2
        logger.info(f"Average f1: {avg_f1}")

rowan1224 · November 19, 2022, 11:35pm

I know this is already old. The reason for this error is that pr and f1 both are dict. To get multiple metrics for the log, it should be {"pearsonr": pr['pearsonr'], "f1": f1['f1']}.

zayuki · September 6, 2023, 3:39pm

Thanks! Your reply helps a lot!

Topic		Replies	Views
Log multiple metrics while training 🤗Datasets	5	11015	March 15, 2022
How to add multiple metrics to Huggingface Transformers Trainer? 🤗Transformers	1	2073	July 26, 2022
Trainer gives error after 1st epoch and evaluation 🤗Transformers	4	4738	June 2, 2023
Logging training accuracy using Trainer class 🤗Transformers	8	10480	December 2, 2021
Error when fine-tuning with the Trainer API 🤗Datasets	4	2694	December 10, 2021

Get multiple metrics when using the huggingface trainer

Related topics