Regarding metrics to use in Fine Tuning Masked Language Modeling

I am training a MLM model using Pytorch Trainer API. Here is my initial code.

data_collator = DataCollatorForWholeWordMask(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)


class SEDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
        
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        return item

    def __len__(self):
        return len(self.encodings["attention_mask"])

train_data = SEDataset(train_encodings)
print("train_data prepared")


training_args = tr.TrainingArguments(

     output_dir='results_mlm_mmt2'
    ,logging_dir='logs_mlm_mmt2'        # directory for storing logs
    ,save_strategy="epoch"
    ,learning_rate=2e-5
    ,logging_steps=40000
    ,overwrite_output_dir=True
    ,num_train_epochs=10
    ,per_device_train_batch_size=32
    ,prediction_loss_only=True
    ,gradient_accumulation_steps=2
    ,fp16=True
)



trainer = tr.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_data
)

The above code works fine but I want to include few things:

  1. How can I include validation and test text data to it and in which format? Do
    I also need to pass labels for validation set?

  2. How can I include some metrics related to MLM to get printed after
    every #steps?

  3. How can I test my trained model?