Behaviour change in checkpoints saved by Trainer

Hi there,

I was hoping to clarify an apparent change in behaviour in what appears in my output folder after completing training and calling .save_model() from the Trainer.

I use the following TrainingArguments

    # define training args
    training_args = TrainingArguments(
        output_dir=args.model_dir,
        num_train_epochs=args.epochs,
        per_device_train_batch_size=args.train_batch_size,
        per_device_eval_batch_size=args.eval_batch_size,
        warmup_steps=args.warmup_steps,
        learning_rate=learning_rate,
        logging_dir=f"{args.output_data_dir}/logs",
        logging_strategy="steps",
        logging_steps=500,
        evaluation_strategy="steps",
        eval_steps=500,
        save_strategy="steps",
        save_steps=500,
        save_total_limit=1,
        load_best_model_at_end=True,
        greater_is_better=False,
    )

In the model.tar.gz file under outputs/ there were the following files:

config.json
pytorch_model.bin
special_tokens_map.json
tokenizer_config.json
tokenizer.json
training_args.bin
vocab.txt

as well as two checkpoint folders checkpoint-xxx. One of these corresponded to the best model, as determined by the lowest validation loss, and one to the final checkpoint. My understanding was that the files listed above at the root of the directory correspond to the best model checkpoint. This behaviour was observed in transformers 4.17.0

Now (transformers version 4.26.0) there is only a single checkpoint-xxx folder corresponding to the best model. I have verified that the artifacts in this folder are the same as those at the root of the directory, so it appears to me that the last model hasn’t been saved. I would always like to have the last model for comparison, so I was surprised not to see it there.

I would just like to confirm that what I’ve observed in 4.26.0 is the expected behaviour, that this has changed from previous behaviour and to get some advice on how I can save the last model checkpoint.

Thank you,
Owen

1 Like