I was experimenting with the Token Classification Example Notebook and encountered something odd.
This notebook fine-tunes DistilBERT on the WNUT-17 dataset. I have increased the original 2 training epochs up to 10 and the results are as follows:
As can be seen from the picture above, during the fine-tuning process the validation loss increases (probably a case of over-fitting) while the metrics improve.
Can someone explain this phenomenon?