Early Stopping using PPL for a XLMRobertaForMaskedLM model

Hello,

I want to continue training an XLMRobertaForMaskedLM model using my own data. I want to use early stopping in evaluating to choose the best mode, but I think cross-entropy loss is not good enough, so I want to use PPL in the form of math.exp(eval_results[‘eval_loss’]).

1.But I only know “metric_for_best_model=‘eval_loss’” works, if I want to use “metric_for_best_model=math.exp(eval_results[‘eval_loss’])”, it failed.

Then I saw the HuggingFace code, the MLM loss is defined originally as:

        masked_lm_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
  1. I tried to write a compute_metrics, it looks like so but failed, the size of preds and labels are both (20, 120):

**
if CODE is loss_fct(preds.view(-1, n), labels.view(-1)),
I got error “ValueError: Expected input batch_size (120) to match target batch_size (2400).”

if CODE is loss_fct(preds.view(-1, n), labels.view(-1, n)),
I got the loss value of 900 Million, it must be wrong **

def my_metrics(eval_pred):
    labels = eval_pred.label_ids
    preds = eval_pred.predictions.argmax(-1)
    n = labels.shape[0]
    print("*"*80)
    print(labels.shape)
    print(preds.shape)
    labels = torch.from_numpy(labels).float()
    preds = torch.from_numpy(preds).float()
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        masked_lm_loss = **CODE**
    loss = masked_lm_loss
    PPL = math.exp(loss)
    print(f'eval_loss\n{loss}')

3.After that I tried to define a new trainer which treat loss as PPL,but it failed, beacuse
prediction_scores = self.lm_head(sequence_output)
AttributeError: ‘MyTrainer’ object has no attribute ‘lm_head’

I think 1 of them could be useful, but I don’t know how to implement it, can someone help me?

Thank you in advance!

Here is the training code:

class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')
        sequence_output = outputs[0]
        prediction_scores = self.lm_head(sequence_output)
   training_args = TrainingArguments(
        output_dir=cur_output_dir,
        num_train_epochs=args.train_epochs,
        per_device_train_batch_size=8,
        logging_steps=100,
        save_total_limit=3,
        evaluation_strategy='steps',
        eval_steps=50,
        learning_rate=2e-5,
        warmup_steps=pos_warmup_steps,
        load_best_model_at_end=True,
        metric_for_best_model='eval_loss',
        disable_tqdm=False,
        gradient_accumulation_steps=4
    )
    
    trainer = Trainer(
        model=pos_model,
        data_collator=pos_collator,
        args=pos_training_args,
        train_dataset=pos_train_dataset,
        eval_dataset=pos_eval_dataset,
        compute_metrics=my_metrics,
        callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
    )
    
    trainer.train()

Hi @kk12, the self refers to the Trainer instance which does not have a lm_head. What you probably want to do is self.model.lm_head(sequence_output). Let me know if that works!

Hi Ivwerra, thank you for your reply!

But it didn’t work. The error was:
RuntimeError: both arguments to matmul need to be at least 1D, but they are 0D and 2D

I build a new trainer which is almost the same as the original definition, and uses self.model.lm_head(sequence_output) as you suggested:

class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')
        sequence_output = outputs[0]
        prediction_scores = self.model.lm_head(sequence_output)

        masked_lm_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        return masked_lm_loss

Then I set the trainer as:

    cont_pre_trainer = MyTrainer(
        model=cont_pre_model,
        args=cont_pre_training_args,
        train_dataset=cont_pre_dataset,
        data_collator=cont_pre_collator,
        eval_dataset=eval_dataset
    )

It’s hard to know where the error comes from without a minimal example reproducing the error. It sounds like some of your shapes are not as they should be. Two comments:

On your point 2. you mentioned that 800M is way too high. I think it can be if you start training from scratch and the loss is very high.

Also why aren’t you using the built-in early stopping criteria (Callbacks) where you can stop based on the loss. Since PPL is just exp(loss) this would be equivalent to using PPL.