For production, we need to use models from the hub, from which we control the updates.
Ideally, we need a way to fork the model repo in our own (public) organization, so that we control the updates ourselves. We cannot incur the risk that someone would delete a repo and changing it without us knowing.
We could create a “copy” of the model manually and re-commit it to our organization, but this is not ideal, as we would lose the ability to track and merge future updates of the original repo.
Is there any way to achieve a fork in the current state of the huggingface hub?
So after investigating, I hit a blocker right here:
(py36_bert) alexandrecombessie@MacBook-Pro-7 average_word_embeddings_glove.6B.300d % git rebase upstream/main
First, rewinding head to replay your work on top of it...
Downloading 0_WordEmbeddings/pytorch_model.bin (480 MB)
Error downloading object: 0_WordEmbeddings/pytorch_model.bin (d819348): Smudge error: Error downloading 0_WordEmbeddings/pytorch_model.bin (d819348e583fca49cf3980e34505d52a3f842064ebd9dc255484125357771240): [d819348e583fca49cf3980e34505d52a3f842064ebd9dc255484125357771240] Object does not exist: [404] Object does not exist
Errors logged to /Users/alexandrecombessie/huggingface/dataikunlp/average_word_embeddings_glove.6B.300d/.git/lfs/logs/20210901T164955.693927.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: 0_WordEmbeddings/pytorch_model.bin: smudge filter lfs failed
I have googled the error and it seems linked to this issue, which seems pretty complex. Do you have some advice?