Trainer doesn't show the loss at each step

melody-ju · September 3, 2020, 2:25pm

Hmmm I want to add that although save_steps is 512 there has been nothing written to the specified checkpoint output dir. (Now it is on step ~2000, and still nothing printed for logging either).
Showing my full training args below.

Any insight would be greatly appreciated. I’m really scratching my head over the logging and saving issue.

batch_size = 1

training_args = TrainingArguments(
    output_dir="./checkpoints",
    per_device_train_batch_size=batch_size,
    do_train=True,
    # fp16=True,  # This has a known bug with t5
    gradient_accumulation_steps=32,
    logging_steps=128,
    save_steps=512,
    overwrite_output_dir=True,
    save_total_limit=10,
)

optimizer = Adafactor(model.parameters(), lr=1e-3, relative_step=False, warmup_init=False)
scheduler = get_constant_schedule(optimizer)
optimizers = optimizer, scheduler

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    optimizers=optimizers
)

trainer.train()

Topic		Replies	Views
Trainer log my custom metrics at training step Beginners	3	4046	July 11, 2024
Logs of training and validation loss Beginners	10	32688	February 14, 2025
How do i get Training and Validation Loss during fine tuning 🤗Transformers	2	14714	August 27, 2021
Trainer does not print to console the loss (train and eval) Beginners	0	1746	June 24, 2023
[trainer] 'train_loss' different from 'loss' 🤗Transformers	4	4731	March 31, 2023

Trainer doesn't show the loss at each step

Related topics