What is the purpose of save_pretrained()?

Hello everyone. I hope my question is not too silly, but there is something that confuses me.

Let’s say I load a huggingface model using from_pretrained() method, and then finetune it using the Trainer class. Now, via TrainingArguments, I get the chance to define an argument called output_dir. If I specify a directory here, won’t my model be saved in this directory, thus enabling me to load it again in the future from this folder, using from_pretrained()?

Here lies my question: if this argument lets me save the model, what is the purpose of save_pretrained()?

Looking on their respective documentations, it is clear that they do something different however: the first one saves something called checkpoints, while the second one saves “the model and its configuration”.

Can someone explain to me their difference, in other words, what is the difference between saving the checkpoints or the model? Won’t the last checkpoint be the same as the model and its weights?

Thanks in advance.

10 Likes

Hi there! The question is a bit weird in the sense you are asking: “Why does the model have this method when the Trainer has that model?”. The base answer is: " because they are two different objects."

Not everyone uses the Trainer to train their model, so there needs to be a method directly on the model to properly save it.

Now inside the Trainer, you could very well never save any checkpoint (save_strategy="no") or have the last checkpoint saved be before the end of training (with a save_strategy="steps") so you won’t necessarily automatically have the last model saved inside a checkpoint.

A checkpoint, by the way, is just a folder with your model, tokenizer (if it was passed to the Trainer) and all necessary files to resume training from there (optimizer state, lr scheduler state, trainer state etc).

To save your model at the end of training, you should use trainer.save_model(optional_output_dir), which will behind the scenes call the save_pretrained of your model (optional_output_dir is optional and will default to the output_dir you set).

17 Likes

Hello. Thank you very much for the detailed answer!

By the way, if I create a model class that inherits from torch.nn.Module and slightly alter a huggingface pretrained model (e.g. adding a different classification head), then train it using native pytorch, I should use torch.save() instead right?

You should subclass PreTrainedModel if your model is very similar to a Transformers model, to be able to retain the full functionality.

Otherwise yes, you should just use torch.save.

1 Like

Cool. As far as I can see, PreTrainedModel inherits from torch.nn.Module, so I guess there shoulnd’t be much difference.

Thank you for everything. Have a nice day.

1 Like