How to evaluate the performance of BERT trained model from scratch?

I am training a BERT model from scratch with my own corpus using this blog post: https://huggingface.co/blog/how-to-train

# Define config for the model
from transformers import BertConfig, BertForMaskedLM, Trainer, TrainingArguments

config = BertConfig(
    vocab_size=32000,
    max_position_embeddings=1024,
    num_attention_heads=12,
    num_hidden_layers=12,
    type_vocab_size=2,
    hidden_act= "gelu", 
    intermediate_size= 3072,
    hidden_dropout_prob= 0.1, 
    hidden_size= 768, 
    initializer_range= 0.02,
    attention_probs_dropout_prob= 0.1, 
)

model = BertForMaskedLM(config=config)

training_args = TrainingArguments(
    output_dir="./bert",
    overwrite_output_dir=True,
    num_train_epochs=5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    save_steps=1000,
    save_total_limit=2,
    do_train=True,                       
    do_eval=True, 
    logging_steps=1000,
    eval_steps = None,
    prediction_loss_only=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Start training
    trainer.train()

I have some questions upon finishing training the model if anyone could help to shed light on it:

  1. How can I evaluate the performance of my unsupervised trained model? Is there a way to see the validation loss or the perplexity score?. At the moment, this is the only thing I see:
    TrainOutput(global_step=804310, training_loss=2.2400301857170966)

I set eval_dataset, do_eval in the training arguments but not sure what would the next steps be to obtain the model performance. My train set is 95% while val set is 5% of the corpus.

  1. For now, I trained the model on 5 epochs. Hypothetically if I want to keep training the modes on 5 more epochs (after the first 5 epochs have been done), what would be the best way to continue to train them without having to train them from scratch again? Do I run the trainer.train() again?

  2. My corpus is just about 1.5GB - what would be the ideal number of epoch to train on?

Apologize for multiple questions but I’m quite new to Deep learning and Language model so I feel like I’m missing a big picture here. Many thanks in advance for your insights!