Hi everyone,
I’m trying to fine-tune an XGLM model from Huggingface for the quy_Latn language. Currently, I have one training dataset and several datasets that I would like to evaluate my model on.
The thing is that I want to have EarlyStoppingCallback in my Trainer instance and I want to specify the loss that the early stopping should use. For example, if I pass two datasets to the Trainer, named eng_Latn and quy_Latn, I would like to use the second one for determining the best model.
Here is some code from my script:
training_args = TrainingArguments(
output_dir=f"{dir_path}/checkpoints",
logging_dir=f"{dir_path}/model_logs",
save_strategy="steps",
evaluation_strategy="steps",
save_steps=1, # Save every 1 steps
eval_steps=1, # Evaluate every 1 steps
save_total_limit=1, # Only keep one checkpoint
per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH_SIZE,
per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH_SIZE,
num_train_epochs=EPOCHS,
remove_unused_columns=False,
report_to="all",
logging_steps=10,
fp16=True,
greater_is_better=False,
metric_for_best_model="quy_Latn_loss",
load_best_model_at_end=True,
save_safetensors=False,
prediction_loss_only=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_datasets,
data_collator=lambda data: collate_fn(data, tokenizer),
callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)
The problem is that I get an error saying that eval_quy_Latn_loss metric does not exist, even though I’m able to see that loss on the logs. Does anyone know what the problem is in this case?
Thanks in advance.