Using the specific loss of a dataset as the early stopping metric

Hi everyone,

I’m trying to fine-tune an XGLM model from Huggingface for the quy_Latn language. Currently, I have one training dataset and several datasets that I would like to evaluate my model on.

The thing is that I want to have EarlyStoppingCallback in my Trainer instance and I want to specify the loss that the early stopping should use. For example, if I pass two datasets to the Trainer, named eng_Latn and quy_Latn, I would like to use the second one for determining the best model.

Here is some code from my script:

training_args = TrainingArguments(
        output_dir=f"{dir_path}/checkpoints",
        logging_dir=f"{dir_path}/model_logs",
        save_strategy="steps",
        evaluation_strategy="steps",
        save_steps=1,  # Save every 1 steps
        eval_steps=1,  # Evaluate every 1 steps
        save_total_limit=1,  # Only keep one checkpoint
        per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH_SIZE,
        per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH_SIZE,
        num_train_epochs=EPOCHS,
        remove_unused_columns=False,
        report_to="all",
        logging_steps=10,
        fp16=True,
        greater_is_better=False,
        metric_for_best_model="quy_Latn_loss",
        load_best_model_at_end=True,
        save_safetensors=False,
        prediction_loss_only=True,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_datasets,
        data_collator=lambda data: collate_fn(data, tokenizer),
        callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
    )

The problem is that I get an error saying that eval_quy_Latn_loss metric does not exist, even though I’m able to see that loss on the logs. Does anyone know what the problem is in this case?

Thanks in advance.