WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time when adding rouge-score

I was using translation notebook and it was working well but when I added metric to show rouge as it in this in summarizations
I got this warning

def metric_fn(eval_predictions):
    predictions, labels = eval_predictions
    decoded_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    for label in labels:
        label[label < 0] = tokenizer.pad_token_id  # Replace masked label tokens
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    # Rouge expects a newline after each sentence
    decoded_predictions = [
        "\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_predictions
    decoded_labels = [
        "\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels
    result_metric_rouge = metric_rouge.compute(
        predictions=decoded_predictions, references=decoded_labels, use_stemmer=True
    # Extract a few results
    result_metric_rouge = {key: value.mid.fmeasure * 100 for key, value in result__metric_rouge.items()}
    # Add mean generated length
    prediction_lens = [
        np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions
    result_metric_rouge["gen_len"] = np.mean(prediction_lens)

    return result_metric_rouge
from transformers.keras_callbacks import PushToHubCallback,KerasMetricCallback

username = "MaryaAI"

tensorboard_callback = TensorBoard(log_dir="./translation_model_save/logs")

metric_callback = KerasMetricCallback(
    metric_fn, eval_dataset=generation_dataset, predict_with_generate=True

push_to_hub_callback = PushToHubCallback(

callbacks = [metric_callback,tensorboard_callback]#, push_to_hub_callback]

    train_dataset, validation_data=validation_dataset, epochs=2, callbacks=callbacks

WARNING:tensorflow:Callback method on_train_batch_end is slow compared to the batch time

And my code goes in an infinite loop even it works well on each notebook separately.

notice: I increased batch size but it is the same result.

I was wondering why not we simply use rouge after training as in previous editions?