Chapter 4 questions

Use this topic for any question about Chapter 4 of the course.

1 Like

Can someone take a look at the chapter 4 notebook. The .push_to_hub() method isn’t working, at first the error was about ‘git-lfs’ and even downloading that doesn’t seem to work…

Thanks

Could you tell us more? Which notebook are you running? When does it fail?

1 Like

This section’s colab notebook: Sharing models and tokenizers - Hugging Face Course

When I execute model.push_to_hub("dummy-model") it throws error, although I’ve tried to solve it by sudo apt-get install git-lfs but it still doesn’t work.

By the way, I’m running all my codes in Google Colab, so I think you could reproduce the error by running the code from colab

Thanks

1 Like

@khalidsaifullaah I see the following error
ValueError: If not specifying clone_from, you need to pass Repository a valid git clone.

Do you see the same? If yes, for the time being, setting use_temp_dir=True in the push_to_hub params solved the issue for me.

12 Likes

Yeah, I saw this one as well. Thanks for the solution, I’ll try using it now…
But I was actually wondering, @sgugger didn’t face these errors when he ran these same codes in the “Push To Hub” video (maybe it’s something to do with colab’s dependencies? in the video he used jupyter notebook)

I have updated the install instructions in the notebooks to reflect all the necessary steps. Could you try again and tell me if it is working (on the latest version of the colab).

1 Like

thanks @sgugger!
just checked the notebook, it’s working fine now!

1 Like

I’m getting this even though git lfs is installed :frowning:

hey @khalidsaifullaah could you please share a minimal example so i can try to reproduce the error on my side?

thanks for the quick response @lewtun!
I was actually trying to pretrain roberta model on GCP’s TPU using HF’s Roberta Flax trining script. I’ve followed the following steps to do it - transformers/examples/flax/language-modeling at master · huggingface/transformers (github.com)

For the time being, I sidetracked the error by removing --push_to_hub flag when running the training script.

ok thanks for the info! since the flax integration is quite new in transformers it’s possible there are some rough edges when it comes to integration with the hub.

i’ll try to reproduce the error and report back

ps. you should be able to push the model to the hub using plain old git-lfs if you really need it :slight_smile:

Thanks @lewtun! Really appreciate your support.

Just wanted to be clear on one thing-
As I’ve removed the push_to_hub flag, after every epoch the model.save_pretrained() method saving the checkpoints in my local directory. Now, when my training will be done, should I just do the following to upload everything to my model hub directory?

git add .
git commit -m "model trained"
git push origin main

Are any other commands necessary (like git lfs)? If so, in which order should it go, could you maybe give some suggestions on it?

Thanks

yes, for files larger than 10MB you’ll need to run git lfs track before git add, e.g.

git lfs track some_large_file.huge
git add .gitattributes
git add some_large_file.huge
git commit -m "add model files"

hth!

1 Like

Thanks a lot! :slight_smile:

1 Like

Getting this when tried to push in the hub. I did git lfs track flax_model.msgpack and git lfs *tfevents* before commit and push…

finally was able to push with the help of this - Failed to push model repo ¡ Issue #8504 ¡ huggingface/transformers (github.com)

1 Like

Hi team,
Going through the last part got me thinking on some questions regarding quota & limits:

  • Is there any limit number of repos a user can have private and public?
  • Is there any limit to the size an individual dataset/model repo can have? Or a per-account limit? (eg: On Kaggle, each user gets a certain fixed GBs to host and sum total should remain within limit.)
  • If I do a 1000 commits of a 1GB model, is 1TB going to be ‘always-accesible’, or we have some stack limitations wrt git history?
  • Is there any limit on number of downloads per model (specifically a privately uploaded model)?

Questions are not necessarily as to what’s supported right now, but with some near-future perspective as well. eg: If I upload a public/private model (hypothetically for both commercial/non-commercial use) and not do the inference-api (just storage), will there be any threat to the 1)stability 2)scalability of such a pipeline?

hey @dk-crazydiv in the near-to-mid future, there are no limits :wink:

1 Like

Thank you @lewtun, but I am imagining myself crossing the gartners hype cycle and identifying the plateau on which the offering lands. Even though the fan in me would love to see this possible, but practically it raises many concerns. Could you please elaborate a bit?
No limits on

  • number of private repos on modelhub/datasethub
  • number of public repos
  • size of the repos
  • number of commits on those repos
  • number of downloads from those repos
  • speed cap on downloads of those repos.

And also if someone subjectively “abuses” the policies and takes “unfair” advantage, does HF hold the right to ban? If yes, then it becomes even more concerning, as it is very very subjective.

Before going deep into specific use cases, it will be great if you could point me to policies and terms of use, as it probably will clear up many of the cases I am thinking of.