How loss is calculated in MLM training

I am training a MLM model using Roberta-XLM large model.

Here is the standard code.

tokenizer = tr.XLMRobertaTokenizer.from_pretrained("xlm-roberta-large",local_files_only=True)
model = tr.XLMRobertaForMaskedLM.from_pretrained("xlm-roberta-large", return_dict=True,local_files_only=True)

df=pd.read_csv("training_data_multilingual.csv") 
train_df=df.message_text.tolist()
train_df=list(set(train_df))
train_df = [x for x in train_df if str(x) != 'nan']

train_encodings = tokenizer(train_df, truncation=True, padding=True, max_length=512, return_tensors="pt")

class SEDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
        
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        return item

    def __len__(self):
        return len(self.encodings["attention_mask"])

train_data = SEDataset(train_encodings)

# print("train data created")

training_args = tr.TrainingArguments(

     output_dir='results_mlm_vocab_exp'
    ,logging_dir='logs_mlm_vocab_exp'        # directory for storing logs
    ,save_strategy="epoch"
    ,learning_rate=2e-5
    ,logging_steps=6000
    ,overwrite_output_dir=True
    ,num_train_epochs=10
    ,per_device_train_batch_size=2
    ,prediction_loss_only=True
    ,gradient_accumulation_steps=4
    ,bf16=True #Ampere GPU
    ,optim="adamw_hf"
)


trainer = tr.Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_data
)

trainer.train()

I have few question related to this:

  • How loss is calculated in MLM training? I see during training these logs are printed {'loss': 1.6117, 'learning_rate': 1.751861042183623e-05, 'epoch': 2.48}. I guess it’s training loss? If so how its calculated?
  • How to pass validation data inside TrainingArguments ? Is it same as training data?
  • Is it logical to get precision, recall, F1 score for training and validation data for MLM training? If so then how to achieve it using Trainer?