🤗Trainer not saving after save_steps

vasudevgupta · April 13, 2021, 11:16am

I am using Trainer for training. My training args are as follows:

    args = TrainingArguments(
        output_dir="bigbird-nq-output-dir",
        overwrite_output_dir=False,
        do_train=True,
        do_eval=True,
        evaluation_strategy="epoch",
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        gradient_accumulation_steps=4,
        learning_rate=5e-5,
        num_train_epochs=3,
        logging_strategy="epoch",
        save_strategy="steps",
        run_name="bigbird-nq",
        disable_tqdm=False,
        load_best_model_at_end=True,
        report_to="wandb",
        remove_unused_columns=False,
        fp16=True,
    )

I am unable to find checkpoints after every 500 steps. Any reasons why??

sgugger · April 13, 2021, 2:10pm

With load_best_model_at_end=True, your save_strategy will be ignored and default to evaluation_strategy. So you will find one checkpoint at the end of each epoch.

vasudevgupta · April 13, 2021, 2:32pm

Gotta. Thanks a lot!

Topic		Replies	Views
Disable checkpointing in Trainer 🤗Transformers	4	7790	January 10, 2022
Trainer saving checkpoints even when 'save_strategy' is set to 'no' 🤗Transformers	1	1334	April 18, 2023
Save only best model in Trainer 🤗Transformers	31	85085	June 25, 2024
Saving model per some step when using Trainer Intermediate	3	9214	December 11, 2023
Behaviour change in checkpoints saved by Trainer 🤗Transformers	0	959	July 17, 2023

🤗Trainer not saving after save_steps

Related topics