Loading model from checkpoint after error in training

If we use just the directory as it was saved without specifying which checkpoint:

model = RobertaForMaskedLM.from_pretrained("./saved/")

what is the model that is used when calling the model() function?

In my case, I have the arguments:

training_args = TrainingArguments(
    output_dir='./saved',
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=8,
    logging_steps=3000,
    save_steps=3000,
    save_total_limit=2,
    seed=1,
    fp16=True
)

The trainer setting:

trainer = Trainer(
    model=some_roberta_model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset
)

And running:

trainer.train()

trainer.save_model('./saved')

After this, the .saved folder contains a config.json, training_args.bin, pytorch_model.bin files and two checkpoint sub-folders. But each of these checkpoint folders also contains a config.json, training_args.bin, pytorch_model.bin.

When I load the folder:

new_roberta = AutoModel.from_pretrained('./saved')

Which one is the model that is used in:

new_roberta(**token_output)

Are the config.json, training_argsbin, pytorch_model.bin in the main folder the same as the corresponding ones in any of the checkpoints sub-folders?

Thanks!

1 Like