Regarding metrics to use in Fine Tuning Masked Language Modeling

I am training a MLM model using Pytorch Trainer API. Here is my initial code.

data_collator = DataCollatorForWholeWordMask(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)

class SEDataset(
    def __init__(self, encodings):
        self.encodings = encodings
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        return item

    def __len__(self):
        return len(self.encodings["attention_mask"])

train_data = SEDataset(train_encodings)
print("train_data prepared")

training_args = tr.TrainingArguments(

    ,logging_dir='logs_mlm_mmt2'        # directory for storing logs

trainer = tr.Trainer(

The above code works fine but I want to include few things:

  1. How can I include validation and test text data to it and in which format? Do
    I also need to pass labels for validation set?

  2. How can I include some metrics related to MLM to get printed after
    every #steps?

  3. How can I test my trained model?