Loss Increases But Metrics Get Better?

Hi :grinning_face_with_smiling_eyes:
I was experimenting with the Token Classification Example Notebook and encountered something odd.

This notebook fine-tunes DistilBERT on the WNUT-17 dataset. I have increased the original 2 training epochs up to 10 and the results are as follows:

As can be seen from the picture above, during the fine-tuning process the validation loss increases (probably a case of over-fitting) while the metrics improve.

Can someone explain this phenomenon?