Great !
I think the simplest way to track both accuracy and F1 score would be to first load them independently:
accuracy_score = load_metric('accuracy')
f1_score = load_metric('f1')
Then you can include them in the compute_metrics
function by simply returning a dict
of entries:
def compute_metrics(pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
# returns a dict like {'f1':0.54221}
f1 = f1_score.compute(predictions=predictions, references=labels)
# returns a dict like {'accuracy': 0.3241}
acc = accuracy_score.compute(predictions=predictions, references=labels)
# merge the two dictionaries
return {**f1, **acc}