Early Stopping with GPT from AutoModelForCausalLM

I am trying to use an evaluation set to implement early stopping for my model to help prevent overfitting.
The training runs, but I receive a message:

" early stopping required metric_for_best_model, but did not find eval_f1 so early stopping is disabled."

I couldn’t find any resources on this. Am, I interpreting this wrong?

Which is the part of your code that incorporates early stopping?