Loading model from checkpoint after error in training

MattiaMG · September 27, 2021, 1:01am

If we use just the directory as it was saved without specifying which checkpoint:

model = RobertaForMaskedLM.from_pretrained("./saved/")

what is the model that is used when calling the model() function?

In my case, I have the arguments:

training_args = TrainingArguments(
    output_dir='./saved',
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=8,
    logging_steps=3000,
    save_steps=3000,
    save_total_limit=2,
    seed=1,
    fp16=True
)

The trainer setting:

trainer = Trainer(
    model=some_roberta_model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset
)

And running:

trainer.train()

trainer.save_model('./saved')

After this, the .saved folder contains a config.json, training_args.bin, pytorch_model.bin files and two checkpoint sub-folders. But each of these checkpoint folders also contains a config.json, training_args.bin, pytorch_model.bin.

When I load the folder:

new_roberta = AutoModel.from_pretrained('./saved')

Which one is the model that is used in:

new_roberta(**token_output)

Are the config.json, training_argsbin, pytorch_model.bin in the main folder the same as the corresponding ones in any of the checkpoints sub-folders?

Thanks!

Topic		Replies	Views
Load a model saved using trainer API Beginners	0	563	April 11, 2022
Load checkpoint from Trainer 🤗Transformers	0	578	February 13, 2024
Resume training from checkpoint Beginners	1	3032	January 5, 2023
Checkpoints - still confused Beginners	0	1642	July 30, 2022
How to load model after running Trainer.save_model? Beginners	3	3120	November 28, 2023

Loading model from checkpoint after error in training

Related topics