I am fine-tuning a BERT model for a token classification task. My training arguments are as follows:
args = TrainingArguments(
output_dir="bert-finetuned-ner",
evaluation_strategy="epoch",
save_strategy="epoch",
save_total_limit=5,
metric_for_best_model="f1",
greater_is_better=True,
load_best_model_at_end=True,
learning_rate=2e-5,
num_train_epochs=50,
weight_decay=0.01,
logging_steps=10,
logging_strategy="epoch",
push_to_hub=True
)
I am using the Trainer
class to train the model:
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
data_collator=data_collator,
compute_metrics=compute_metrics,
tokenizer=tokenizer,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
I then train the model and push it to the Hub using the following code:
trainer.train()
trainer.push_to_hub()
The model was trained for 10 epochs and then stopped due to the EarlyStoppingCallback
. The model achieved its highest F1 score on the 7th epoch, but the auto-generated model card on Hugging Face Hub displays the metrics from the final (10th) epoch, which has a lower F1 score.
This leaves me uncertain as to whether the model checkpoints that were pushed to the Hub are from the 7th epoch or the 10th epoch, as I had specified load_best_model_at_end=True
in my TrainingArguments
and expected the model card to reflect the metrics from the 7th epoch.
Is there a way to determine which epoch’s checkpoints were used in the model that was pushed to the Hub? I have looked through the HF Transformers and Hub documentation, but have not found any information related to this.