Error in fine-tuning BERT

Great :partying_face:!

I think the simplest way to track both accuracy and F1 score would be to first load them independently:

accuracy_score = load_metric('accuracy')
f1_score = load_metric('f1')

Then you can include them in the compute_metrics function by simply returning a dict of entries:

def compute_metrics(pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    # returns a dict like {'f1':0.54221}
    f1 = f1_score.compute(predictions=predictions, references=labels)
    # returns a dict like {'accuracy': 0.3241}
    acc = accuracy_score.compute(predictions=predictions, references=labels)
    # merge the two dictionaries
    return {**f1, **acc}
3 Likes