Save only best model in Trainer

prachi12 · July 17, 2021, 4:22pm

I am using the below -

  args = TrainingArguments(
      output_dir=f"./out_fold{i}",
      overwrite_output_dir = 'True',
      evaluation_strategy="steps",
      eval_steps=40,
      logging_steps = 40,
      learning_rate = 5e-5,
      per_device_train_batch_size=8,
      per_device_eval_batch_size=8,
      num_train_epochs=10,
      seed=0,
      save_total_limit = 1,
      # report_to = "none",
  #     logging_steps = 'epoch',
      load_best_model_at_end=True,
      save_strategy = "no"
  )
  trainer = Trainer(
      model=model,
      args=args,
      train_dataset=train_dataset,
      eval_dataset=val_dataset,
      # compute_metrics=compute_metrics,
      
      # callbacks=[EarlyStoppingCallback(early_stopping_pa)],
  )
  trainer.train()
  trainer.save_model(f'out_fold{i}')

Here, thought save_strategy = “no” , the checkpoints are saved at start in disk (as below) due to which disk goes full. Can you suggest what’s going wrong?

***** Running Evaluation *****
Num examples = 567
Batch size = 8
Saving model checkpoint to ./out_fold0/checkpoint-40
Configuration saved in ./out_fold0/checkpoint-40/config.json
Model weights saved in ./out_fold0/checkpoint-40/pytorch_model.bin
Deleting older checkpoint [out_fold0/checkpoint-760] due to args.save_total_limit
***** Running Evaluation *****

Topic		Replies	Views
Checkpoints and disk storage 🤗Transformers	15	8038	June 2, 2024
Question Regarding trainer arguments:: load_best_model_at_end Beginners	2	1948	April 19, 2021
Saving only the best performing checkpoint 🤗Transformers	19	18203	May 23, 2023
Behaviour change in checkpoints saved by Trainer 🤗Transformers	0	957	July 17, 2023
Disable checkpointing in Trainer 🤗Transformers	4	7778	January 10, 2022

Save only best model in Trainer

Related topics