Early Stopping using PPL for a XLMRobertaForMaskedLM model

kk12 · August 24, 2022, 3:25pm

Hello,

I want to continue training an XLMRobertaForMaskedLM model using my own data. I want to use early stopping in evaluating to choose the best mode, but I think cross-entropy loss is not good enough, so I want to use PPL in the form of math.exp(eval_results[‘eval_loss’]).

1.But I only know “metric_for_best_model=‘eval_loss’” works, if I want to use “metric_for_best_model=math.exp(eval_results[‘eval_loss’])”, it failed.

Then I saw the HuggingFace code, the MLM loss is defined originally as:

        masked_lm_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

I tried to write a compute_metrics, it looks like so but failed, the size of preds and labels are both (20, 120):

**
if CODE is loss_fct(preds.view(-1, n), labels.view(-1)),
I got error “ValueError: Expected input batch_size (120) to match target batch_size (2400).”

if CODE is loss_fct(preds.view(-1, n), labels.view(-1, n)),
I got the loss value of 900 Million, it must be wrong **

def my_metrics(eval_pred):
    labels = eval_pred.label_ids
    preds = eval_pred.predictions.argmax(-1)
    n = labels.shape[0]
    print("*"*80)
    print(labels.shape)
    print(preds.shape)
    labels = torch.from_numpy(labels).float()
    preds = torch.from_numpy(preds).float()
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        masked_lm_loss = **CODE**
    loss = masked_lm_loss
    PPL = math.exp(loss)
    print(f'eval_loss\n{loss}')

3.After that I tried to define a new trainer which treat loss as PPL,but it failed, beacuse
prediction_scores = self.lm_head(sequence_output)
AttributeError: ‘MyTrainer’ object has no attribute ‘lm_head’

I think 1 of them could be useful, but I don’t know how to implement it, can someone help me?

Thank you in advance!

Here is the training code:

class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')
        sequence_output = outputs[0]
        prediction_scores = self.lm_head(sequence_output)

   training_args = TrainingArguments(
        output_dir=cur_output_dir,
        num_train_epochs=args.train_epochs,
        per_device_train_batch_size=8,
        logging_steps=100,
        save_total_limit=3,
        evaluation_strategy='steps',
        eval_steps=50,
        learning_rate=2e-5,
        warmup_steps=pos_warmup_steps,
        load_best_model_at_end=True,
        metric_for_best_model='eval_loss',
        disable_tqdm=False,
        gradient_accumulation_steps=4
    )
    
    trainer = Trainer(
        model=pos_model,
        data_collator=pos_collator,
        args=pos_training_args,
        train_dataset=pos_train_dataset,
        eval_dataset=pos_eval_dataset,
        compute_metrics=my_metrics,
        callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
    )
    
    trainer.train()

lvwerra · August 25, 2022, 8:30am

Hi @kk12, the self refers to the Trainer instance which does not have a lm_head. What you probably want to do is self.model.lm_head(sequence_output). Let me know if that works!

kk12 · August 25, 2022, 2:41pm

Hi Ivwerra, thank you for your reply!

But it didn’t work. The error was:
RuntimeError: both arguments to matmul need to be at least 1D, but they are 0D and 2D

I build a new trainer which is almost the same as the original definition, and uses self.model.lm_head(sequence_output) as you suggested:

class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')
        sequence_output = outputs[0]
        prediction_scores = self.model.lm_head(sequence_output)

        masked_lm_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        return masked_lm_loss

Then I set the trainer as:

    cont_pre_trainer = MyTrainer(
        model=cont_pre_model,
        args=cont_pre_training_args,
        train_dataset=cont_pre_dataset,
        data_collator=cont_pre_collator,
        eval_dataset=eval_dataset
    )

lvwerra · August 29, 2022, 8:37am

It’s hard to know where the error comes from without a minimal example reproducing the error. It sounds like some of your shapes are not as they should be. Two comments:

On your point 2. you mentioned that 800M is way too high. I think it can be if you start training from scratch and the loss is very high.

Also why aren’t you using the built-in early stopping criteria (Callbacks) where you can stop based on the loss. Since PPL is just exp(loss) this would be equivalent to using PPL.

Topic		Replies	Views
Evaluate Model on Test dataset (PPL) Beginners	3	1480	June 10, 2021
How loss is calculated in MLM training 🤗Transformers	0	847	April 1, 2022
Early stopping training using Validation loss as the metric for best model Beginners	1	8783	February 9, 2023
Question about loss computing in training masked-language-model Beginners	0	327	March 17, 2022
Early stopping callback problem Beginners	2	8339	April 22, 2021

Early Stopping using PPL for a XLMRobertaForMaskedLM model

Related topics