I’m currently using the model.push_to_hub() from a notebook and it saves the model correctly.
I have also seen an example on how to read using a version (which is just a branch?)
My problem is - how do I save a model to a new version using the APIs?
Since models are stored on the Hub using a git repository, every time you use push_to_hub it will automatically create a new commit on the model repo, which is effectively a new version. In this model, revuze/my-awesome-model-2-v1, for example, you can see that in the commits there are two commits that say “add model”, probably from running push_to_hub twice.
When you run push_to_hub you can choose to set a commit message (Models), among other options. And when you want to download your model, you specify a specific version of the model to download with the particular Git commit hash (e.g. 5f22418800ced4f10a5f0ef1ba67748d50d3661b for the first version of the model in your repo above), branch, or Git tag. You can see more about that here: How to download files from the Hub
As an addition to @NimaBoscarino fantastic answer, all the Auto classes, such as AutoModel, have a revision parameter you can use to specify the Git commit hash.
Here is an example
model = AutoModel.from_pretrained(
"julien-c/EsperBERTo-small", revision="v2.0.1" # tag name, or branch name, or commit hash
)
I understand now, thank you.
So as a follow up question - how do I set up a tag whilst pushing from the API? (or have the commit hash of the current push) ?
I’m tryin to implement an automated thing, without needing to look at the UI for the older versions.
HI, Ill revive this one:
I understand the setup, but I have now updated a model (changed both tokenizer and model files, and called push_to_hub on both model and tokenizer. However, after running this, the tokenizer is still old and some model checkpoints are still in the repo. I changed from a model with 3 checkpoints files to 9, and since all of them have different names they just added the nine files. however, for the tokenizer it should have recognized that the tokenizer-files were completely changed?