Bert for Text classification evaluation - help needed

Hi all - trying to perform a downstream classification task with BERT and wondering if charting the training loss vs the eval loss and looking at accuracy or f1 score is enough of a framework before putting the model into production?

I also plan to test the model once it is tuned properly. Any thoughts?
Cheers DB.