Hi all, I’d like to ask if there is any way to get multiple metrics during fine-tuning a model. Now I’m training a model for performing the GLUE-STS task, so I’ve been trying to get the pearsonr and f1score as the evaluation metrics. I referred to the link (Log multiple metrics while training) in order to achieve it, but in the middle of the second training epoch, it gave me the following error:
Trainer is attempting to log a value of "{'pearsonr': 0.8609849499038021}" of type <class 'dict'> for key "eval/pearsonr" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'f1': 0.8307692307692308}" of type <class 'dict'> for key "eval/f1" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-3435b262f1ae> in <module>()
----> 1 trainer.train()
2 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in _save_checkpoint(self, model, trial, metrics)
1724 self.state.best_metric is None
1725 or self.state.best_model_checkpoint is None
-> 1726 or operator(metric_value, self.state.best_metric)
1727 ):
1728 self.state.best_metric = metric_value
TypeError: '>' not supported between instances of 'dict' and 'dict'
And this is my compute_metrics code snippet:
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = predictions[:, 0]
binary_predictions = [1.0 if prediction >= 3.0 else 0.0 for prediction in predictions]
binary_labels = [1.0 if label >= 3.0 else 0.0 for label in labels]
pr = metric_pearsonr.compute(predictions=predictions, references=labels)
f1 = metric_f1.compute(predictions=binary_predictions, references=binary_labels)
return {"pearsonr": pr, "f1": f1}
It works fine if I only use one of the metrics like return pr or return f1. Does anyone have suggestions about this issue? I’d really appreciate it.
The first line in your error message indicates that it expects a scalar instead of a dictionary (Trainer is attempting to log a value of "{'pearsonr': 0.8609849499038021}" of type <class 'dict'> for key "eval/pearsonr" as a scalar.).
Can you share why you want to return the values in a dictionary and not as values (i.e. why not use return pr, f1)?
Hi Thanks for the reply. There’s no certain reason that I used a dictionary because I just followed the way in this discussion (Log multiple metrics while training). I also tried return pr, f1 as you suggested but it showed me another error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-3435b262f1ae> in <module>()
----> 1 trainer.train()
3 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
2511
2512 if all_losses is not None:
-> 2513 metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
2514
2515 # Prefix all keys with metric_key_prefix + '_'
TypeError: 'tuple' object does not support item assignment
I know this is already old. The reason for this error is that pr and f1 both are dict. To get multiple metrics for the log, it should be {"pearsonr": pr['pearsonr'], "f1": f1['f1']}.