Hello,
I want to continue training an XLMRobertaForMaskedLM model using my own data. I want to use early stopping in evaluating to choose the best mode, but I think cross-entropy loss is not good enough, so I want to use PPL in the form of math.exp(eval_results[âeval_lossâ]).
1.But I only know âmetric_for_best_model=âeval_lossââ works, if I want to use âmetric_for_best_model=math.exp(eval_results[âeval_lossâ])â, it failed.
Then I saw the HuggingFace code, the MLM loss is defined originally as:
masked_lm_loss = None
if labels is not None:
loss_fct = CrossEntropyLoss()
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
- I tried to write a compute_metrics, it looks like so but failed, the size of preds and labels are both (20, 120):
**
if CODE is loss_fct(preds.view(-1, n), labels.view(-1)),
I got error âValueError: Expected input batch_size (120) to match target batch_size (2400).â
if CODE is loss_fct(preds.view(-1, n), labels.view(-1, n)),
I got the loss value of 900 Million, it must be wrong **
def my_metrics(eval_pred):
labels = eval_pred.label_ids
preds = eval_pred.predictions.argmax(-1)
n = labels.shape[0]
print("*"*80)
print(labels.shape)
print(preds.shape)
labels = torch.from_numpy(labels).float()
preds = torch.from_numpy(preds).float()
if labels is not None:
loss_fct = CrossEntropyLoss()
masked_lm_loss = **CODE**
loss = masked_lm_loss
PPL = math.exp(loss)
print(f'eval_loss\n{loss}')
3.After that I tried to define a new trainer which treat loss as PPL,but it failed, beacuse
prediction_scores = self.lm_head(sequence_output)
AttributeError: âMyTrainerâ object has no attribute âlm_headâ
I think 1 of them could be useful, but I donât know how to implement it, can someone help me?
Thank you in advance!
Here is the training code:
class MyTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels")
outputs = model(**inputs)
logits = outputs.get('logits')
sequence_output = outputs[0]
prediction_scores = self.lm_head(sequence_output)
training_args = TrainingArguments(
output_dir=cur_output_dir,
num_train_epochs=args.train_epochs,
per_device_train_batch_size=8,
logging_steps=100,
save_total_limit=3,
evaluation_strategy='steps',
eval_steps=50,
learning_rate=2e-5,
warmup_steps=pos_warmup_steps,
load_best_model_at_end=True,
metric_for_best_model='eval_loss',
disable_tqdm=False,
gradient_accumulation_steps=4
)
trainer = Trainer(
model=pos_model,
data_collator=pos_collator,
args=pos_training_args,
train_dataset=pos_train_dataset,
eval_dataset=pos_eval_dataset,
compute_metrics=my_metrics,
callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)
trainer.train()