Hello everyone. I hope my question is not too silly, but there is something that confuses me.
Let’s say I load a huggingface model using from_pretrained() method, and then finetune it using the Trainer class. Now, via TrainingArguments, I get the chance to define an argument called output_dir. If I specify a directory here, won’t my model be saved in this directory, thus enabling me to load it again in the future from this folder, using from_pretrained()?
Here lies my question: if this argument lets me save the model, what is the purpose of save_pretrained()?
Looking on their respective documentations, it is clear that they do something different however: the first one saves something called checkpoints, while the second one saves “the model and its configuration”.
Can someone explain to me their difference, in other words, what is the difference between saving the checkpoints or the model? Won’t the last checkpoint be the same as the model and its weights?
Thanks in advance.
Hi there! The question is a bit weird in the sense you are asking: “Why does the model have this method when the Trainer has that model?”. The base answer is: " because they are two different objects."
Not everyone uses the
Trainer to train their model, so there needs to be a method directly on the model to properly save it.
Now inside the
Trainer, you could very well never save any checkpoint (
save_strategy="no") or have the last checkpoint saved be before the end of training (with a
save_strategy="steps") so you won’t necessarily automatically have the last model saved inside a checkpoint.
A checkpoint, by the way, is just a folder with your model, tokenizer (if it was passed to the Trainer) and all necessary files to resume training from there (optimizer state, lr scheduler state, trainer state etc).
To save your model at the end of training, you should use
trainer.save_model(optional_output_dir), which will behind the scenes call the
save_pretrained of your model (
optional_output_dir is optional and will default to the
output_dir you set).
Hello. Thank you very much for the detailed answer!
By the way, if I create a model class that inherits from torch.nn.Module and slightly alter a huggingface pretrained model (e.g. adding a different classification head), then train it using native pytorch, I should use torch.save() instead right?
You should subclass
PreTrainedModel if your model is very similar to a Transformers model, to be able to retain the full functionality.
Otherwise yes, you should just use
Cool. As far as I can see,
PreTrainedModel inherits from
torch.nn.Module, so I guess there shoulnd’t be much difference.
Thank you for everything. Have a nice day.