Hub and Versioning, how to save a new version?

I’m currently using the model.push_to_hub() from a notebook and it saves the model correctly.
I have also seen an example on how to read using a version (which is just a branch?)

My problem is - how do I save a model to a new version using the APIs?

Help would be greatly appreciated, thanks

Hello revuze, welcome!

Since models are stored on the Hub using a git repository, every time you use push_to_hub it will automatically create a new commit on the model repo, which is effectively a new version. In this model, revuze/my-awesome-model-2-v1, for example, you can see that in the commits there are two commits that say “add model”, probably from running push_to_hub twice.

When you run push_to_hub you can choose to set a commit message (Models), among other options. And when you want to download your model, you specify a specific version of the model to download with the particular Git commit hash (e.g. 5f22418800ced4f10a5f0ef1ba67748d50d3661b for the first version of the model in your repo above), branch, or Git tag. You can see more about that here: How to download files from the Hub

1 Like

As an addition to @NimaBoscarino fantastic answer, all the Auto classes, such as AutoModel, have a revision parameter you can use to specify the Git commit hash.

Here is an example

model = AutoModel.from_pretrained(
    "julien-c/EsperBERTo-small", revision="v2.0.1"  # tag name, or branch name, or commit hash
)

from Share a model

1 Like

I understand now, thank you.
So as a follow up question - how do I set up a tag whilst pushing from the API? (or have the commit hash of the current push) ?
I’m tryin to implement an automated thing, without needing to look at the UI for the older versions.

Thanks

1 Like

There are many approaches for this. With huggingface_hub library, you can easily add a tag as follows

repo = Repository(
            "hub,
            clone_from=f"{USER}/{REPO_NAME}",
            revision="main",
        )

        repo.add_tag("v4.5.0", message="This is an annotated tag", remote="origin")

with the con-side that it requires to clone the repo locally.

If you just want the last commit hash, you can use model_info. I don’t think we have any support to automatically retrieve older commits though.

from huggingface_hub import model_info
model_info("gpt2")
>>ModelInfo: {
>>	  modelId: gpt2
>>	sha: 6c0e6080953db56375760c0471a8c5f2929baf11
>>	...
>>}

Any ideas or feedback are more than appreciated both in GitHub - huggingface/huggingface_hub: All the open source things related to the Hugging Face Hub. and GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

cc @adrin @lysandre

HI, Ill revive this one:
I understand the setup, but I have now updated a model (changed both tokenizer and model files, and called push_to_hub on both model and tokenizer. However, after running this, the tokenizer is still old and some model checkpoints are still in the repo. I changed from a model with 3 checkpoints files to 9, and since all of them have different names they just added the nine files. however, for the tokenizer it should have recognized that the tokenizer-files were completely changed?

What is the correct procedure for this setup?