Prevent creation of multiple checkpoints

In my training arguments I selected to save every 200 steps, but my model is fairly large (relative to my disk size). I would like to save every 200 steps, but every save should just overwrite previous save instead of creating new save point. Is this possible?

1 Like

Strictly speaking, it’s not overwriting, but I think save_total_limit or save_only_model are closer to the intended purpose.

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="out",
    save_strategy="steps",
    save_steps=200,
    save_total_limit=1,      # deletes older checkpoints
    save_only_model=True,    # 4.37+; skips optimizer/scheduler to shrink size
)

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.