How to fork (in the git sense) a model repository?

Hi,

For production, we need to use models from the hub, from which we control the updates.

Ideally, we need a way to fork the model repo in our own (public) organization, so that we control the updates ourselves. We cannot incur the risk that someone would delete a repo and changing it without us knowing.

We could create a “copy” of the model manually and re-commit it to our organization, but this is not ideal, as we would lose the ability to track and merge future updates of the original repo.

Is there any way to achieve a fork in the current state of the huggingface hub?

Thanks in advance,

Alex Combessie

1 Like

You can add the original repository as “upstream” repository in order to track and merge future updates, like so:

git remote add upstream <URL of model>.git

You can then sync again by doing:

git fetch upstream 
git rebase upstream/master

Thanks Niels! Let me try that and report findings.

So after investigating, I hit a blocker right here:

(py36_bert) alexandrecombessie@MacBook-Pro-7 average_word_embeddings_glove.6B.300d % git rebase upstream/main

First, rewinding head to replay your work on top of it...
Downloading 0_WordEmbeddings/pytorch_model.bin (480 MB)
Error downloading object: 0_WordEmbeddings/pytorch_model.bin (d819348): Smudge error: Error downloading 0_WordEmbeddings/pytorch_model.bin (d819348e583fca49cf3980e34505d52a3f842064ebd9dc255484125357771240): [d819348e583fca49cf3980e34505d52a3f842064ebd9dc255484125357771240] Object does not exist: [404] Object does not exist

Errors logged to /Users/alexandrecombessie/huggingface/dataikunlp/average_word_embeddings_glove.6B.300d/.git/lfs/logs/20210901T164955.693927.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: 0_WordEmbeddings/pytorch_model.bin: smudge filter lfs failed

I have googled the error and it seems linked to this issue, which seems pretty complex. Do you have some advice?

Cheers,

Alex

After more investigating, I managed to make the rebase work, using this script:

huggingface-cli login
huggingface-cli repo create ${MODEL_NAME} --organization ${NEW_ORG}
git lfs install --skip-smudge
git clone https://huggingface.co/${NEW_ORG}/${MODEL_NAME}
cd ${MODEL_NAME}
git remote add upstream https://huggingface.co/${ORIGINAL_ORG}/${MODEL_NAME}
git fetch upstream
git rebase upstream/main
git push --force-with-lease

I thought I had solved the case… Except that the new model is somehow forbidden when I tried to load it in my code:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://cdn-lfs.huggingface.co/DataikuNLP/average_word_embeddings_glove.6B.300d/d819348e583fca49cf3980e34505d52a3f842064ebd9dc255484125357771240

Is this issue solvable from my end or does it require a change on your model hub infrastructure?

Thanks for your help,

Alex

Hi,

This seems to work

git lfs clone https://huggingface.co/${NEW_ORG}/${MODEL_NAME}
cd ${MODEL_NAME}
git lfs install --skip-smudge --local # --local affects only this clone, try without it
git remote add upstream https://huggingface.co/${ORIGINAL_ORG}/${MODEL_NAME}
git fetch upstrream
git checkout -b temp upstream/main
git rebase main # resolve conflicts if needed and finish rebasing
git lfs pull upstream
git push origin temp
git lfs push --all origin temp
git lfs install --force --local

for reference : how to rebase with git lfs? · Issue #1287 · git-lfs/git-lfs · GitHub

1 Like

Hi @dataiku did the last snippet worked ?