Hey! I am trying to continue training by loading a checkpoint. But for some reason, it always starts from scratch. Probably I am just missing something.
training_arguments = Seq2SeqTrainingArguments( predict_with_generate=True, evaluation_strategy='steps', per_device_train_batch_size=training_config['per_device_train_batch_size'], per_device_eval_batch_size=training_config['per_device_eval_batch_size'], fp16=True, output_dir=training_output_path, overwrite_output_dir=True, logging_steps=training_config['logging_steps'], save_steps=training_config['save_steps'], eval_steps=training_config['eval_steps'], warmup_steps=training_config['warmup_steps'], metric_for_best_model='eval_loss', greater_is_better=False) trainer = Seq2SeqTrainer( model=model, tokenizer=tokenizer, args=training_arguments, compute_metrics=compute_metrics, train_dataset=train_ds, eval_dataset=eval_ds, )
Here are the logs:
loading weights file .../models/checkpoint-2000/pytorch_model.bin All model checkpoint weights were used when initializing EncoderDecoderModel. ***** Running training ***** Num examples = 222862 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 83574
I am missing some like:
Continuing training from checkpoint, will skip to saved global_step Continuing training from epoch 0 Continuing training from global step 48000 Continuing training from 0 non-embedding floating-point operations Will skip the first 48000 steps in the first epoch
Which I found here: Load from checkpoint not skipping steps - Transformers - Hugging Face Forums
Maybe somebody can help me? Thank you in advance!