Hi everyone, I am new to Huggingface and need it to my master’s final thesis. I have been training a Seq2Seq models with AutoModelForSeq2SeqLM and Seq2SeqTrainer and there is something that is not clear for myself.
I have set up the arguments to store all the checkpoints, load the best of them at the end of training to later push the best model to the Hub.
Here is the args code:
args = Seq2SeqTrainingArguments( output_dir=model_name, dataloader_num_workers=2, eval_steps=299, #280 evaluation_strategy="steps", #epoch fp16=True, generation_num_beams=2, hub_model_id=model_id, hub_strategy="checkpoint", hub_token="hf_XXXXXXXXXXXXXXXXXXXXXX, learning_rate=1e-5*gradient_accumulation_steps, load_best_model_at_end=True, logging_steps=299, logging_strategy="steps", metric_for_best_model="rouge1", num_train_epochs=1, optim="adafactor", per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size, predict_with_generate=True, push_to_hub=True, save_strategy="steps", save_steps=299, #save_total_limit=3, report_to="wandb", warmup_steps=598, weight_decay=0.01, gradient_accumulation_steps = gradient_accumulation_steps, #gradient_checkpointing=True, label_smoothing_factor = 0.1, group_by_length=False )
Once the train finishes by early stopping I get the message that the best model was load:
Training completed. Do not forget to share your model on huggingface.co/models =) Loading best model from pegasus-newsroom-cnn_full-adafactor-bs6/checkpoint-598 (score: 39.4265).
That makes sense so I push it to the hub with trainer.push_to_hub() but when I go the the model hub the model card details are not the ones from the best model evaluation which was at step 598 as follows below, instead of this the last step’s details are filled which is confusing about what actual step was pushed.
Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len 3.2894 0.1 299 2.9464 39.4079 18.3064 28.093 36.5182 64.6904 3.0427 0.2 598 2.9307 39.4265 18.2924 28.247 36.6382 60.5696 3.1017 0.3 897 2.9891 39.0977 17.9198 27.9078 36.2363 58.5172 3.2891 0.4 1196 3.5756 29.5555 11.7552 22.4675 27.2432 45.0232 637.0317 0.5 1495 nan 0.0 0.0 0.0 0.0 1.0
And the results on the model card:
This model was trained from scratch on the cnn_dailymail dataset.
It achieves the following results on the evaluation set:
- Loss: nan
- Rouge1: 0.0
- Rouge2: 0.0
- Rougel: 0.0
- Rougelsum: 0.0
- Gen Len: 1.0
More confusing, if I go to the model’s files on the repo for the model on the hub where the files are described as step 299 as follows:
My question is, is there any way to guarantee the best model is load and pushed to the hub? Is it possible to update the model card with that checkpoint evaluation step results, or should I do it manually?
I am very confused about this and if the model uploaded info was regarding the best model which was the step 590 and not the last one nor the step 299.
Many thanks in advance,