Early stopping callback problem

dbejarano31 · April 21, 2021, 7:54am

Hello,

I am having problems with the EarlyStoppingCallback I set up in my trainer class as below:

training_args = TrainingArguments(
    output_dir = 'BERT',
    num_train_epochs = epochs,
    do_train = True,
    do_eval = True,
    evaluation_strategy = 'epoch',
    logging_strategy = 'epoch',
    per_device_train_batch_size = batch_size,
    per_device_eval_batch_size = batch_size,
    warmup_steps = 250,
    weight_decay = 0.01,
    fp16 = True,
    metric_for_best_model = 'eval_loss',
    load_best_model_at_end = True
)

trainer = MyTrainer(
    model = bert,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = val_dataset,
    compute_metrics = compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

trainer.train()

I keep getting the following error:

I already tried running the code without the metric_for_best_model arg, but it still gives me the same error.

I tweaked the Trainer class a bit to report metrics during training, and also created custom_metrics to report during validation. I suspect that maybe I made a mistake there and that’s why I can’t retrieve the validation loss now. See here for the tweaked code.

Thanks in advance!!

sgugger · April 21, 2021, 12:09pm

You won’t be able to use the EarlyStoppingCallback with a nested dictionary of metrics as you did, no. And is will need the metric you are looking for to be prefixed by eval_ (otherwise it will add it unless you change the code too). You probably will need to write your own version of the callback for this use case.

At some point, instead of rewriting the whole Trainer, you might be interested in writing your own training loop with Accelerate. You can still have mixed precision training and distributed training but will have full control over your training loop. There is one example for each task using accelerate (the run_xxx_no_trainer) in the examples of Transformers

dbejarano31 · April 22, 2021, 1:43pm

Thanks so much @sgugger! Will try it out!

Topic		Replies	Views
Problem with EarlyStoppingCallback 🤗Transformers	13	10667	April 4, 2024
Early stopping training using Validation loss as the metric for best model Beginners	1	8765	February 9, 2023
Early_stopping_patience param in EarlyStoppingCallback 🤗Transformers	2	3191	April 15, 2024
Why i can't use EarlyStoppingCallback and load_best_model_at_end=False 🤗Transformers	0	712	August 8, 2023
Early stopping for eval loss causes timeout? 🤗Accelerate	10	1714	June 20, 2024

Early stopping callback problem

Related topics