Use this topic for any question about Chapter 4 of the course.
Can someone take a look at the chapter 4 notebook. The
.push_to_hub() method isn’t working, at first the error was about ‘git-lfs’ and even downloading that doesn’t seem to work…
Could you tell us more? Which notebook are you running? When does it fail?
This section’s colab notebook: Sharing models and tokenizers - Hugging Face Course
When I execute
model.push_to_hub("dummy-model") it throws error, although I’ve tried to solve it by
sudo apt-get install git-lfs but it still doesn’t work.
By the way, I’m running all my codes in Google Colab, so I think you could reproduce the error by running the code from colab
@khalidsaifullaah I see the following error
ValueError: If not specifying
clone_from, you need to pass Repository a valid git clone.
Do you see the same? If yes, for the time being, setting
use_temp_dir=True in the
push_to_hub params solved the issue for me.
Yeah, I saw this one as well. Thanks for the solution, I’ll try using it now…
But I was actually wondering, @sgugger didn’t face these errors when he ran these same codes in the “Push To Hub” video (maybe it’s something to do with colab’s dependencies? in the video he used jupyter notebook)
I have updated the install instructions in the notebooks to reflect all the necessary steps. Could you try again and tell me if it is working (on the latest version of the colab).
just checked the notebook, it’s working fine now!
hey @khalidsaifullaah could you please share a minimal example so i can try to reproduce the error on my side?
thanks for the quick response @lewtun!
I was actually trying to pretrain roberta model on GCP’s TPU using HF’s Roberta Flax trining script. I’ve followed the following steps to do it - transformers/examples/flax/language-modeling at master · huggingface/transformers (github.com)
For the time being, I sidetracked the error by removing --push_to_hub flag when running the training script.
ok thanks for the info! since the flax integration is quite new in
transformers it’s possible there are some rough edges when it comes to integration with the hub.
i’ll try to reproduce the error and report back
ps. you should be able to push the model to the hub using plain old git-lfs if you really need it
Thanks @lewtun! Really appreciate your support.
Just wanted to be clear on one thing-
As I’ve removed the
push_to_hub flag, after every
model.save_pretrained() method saving the checkpoints in my local directory. Now, when my training will be done, should I just do the following to upload everything to my model hub directory?
git add . git commit -m "model trained" git push origin main
Are any other commands necessary (like
git lfs)? If so, in which order should it go, could you maybe give some suggestions on it?
yes, for files larger than 10MB you’ll need to run
git lfs track before
git add, e.g.
git lfs track some_large_file.huge git add .gitattributes git add some_large_file.huge git commit -m "add model files"
Thanks a lot!
Getting this when tried to push in the hub. I did
git lfs track flax_model.msgpack and
git lfs *tfevents* before commit and push…
finally was able to push with the help of this - Failed to push model repo · Issue #8504 · huggingface/transformers (github.com)
Going through the last part got me thinking on some questions regarding quota & limits:
- Is there any limit number of repos a user can have private and public?
- Is there any limit to the size an individual dataset/model repo can have? Or a per-account limit? (eg: On Kaggle, each user gets a certain fixed GBs to host and sum total should remain within limit.)
- If I do a 1000 commits of a 1GB model, is 1TB going to be ‘always-accesible’, or we have some stack limitations wrt git history?
- Is there any limit on number of downloads per model (specifically a privately uploaded model)?
Questions are not necessarily as to what’s supported right now, but with some near-future perspective as well. eg: If I upload a public/private model (hypothetically for both commercial/non-commercial use) and not do the inference-api (just storage), will there be any threat to the 1)stability 2)scalability of such a pipeline?
hey @dk-crazydiv in the near-to-mid future, there are no limits
Thank you @lewtun, but I am imagining myself crossing the gartners hype cycle and identifying the plateau on which the offering lands. Even though the fan in me would love to see this possible, but practically it raises many concerns. Could you please elaborate a bit?
No limits on
- number of private repos on modelhub/datasethub
- number of public repos
- size of the repos
- number of commits on those repos
- number of downloads from those repos
- speed cap on downloads of those repos.
And also if someone subjectively “abuses” the policies and takes “unfair” advantage, does HF hold the right to ban? If yes, then it becomes even more concerning, as it is very very subjective.