How to get around rate limits?

mysocratesnote · April 23, 2025, 10:28am

I’m trying to upload my first dataset. It’s only ~75,000 files and about 1GB but I was immediately I got 429 errors.

John6666 · April 23, 2025, 10:47am

It is possible to mitigate this in Pro or Enterprise, but it may be quicker to reduce the number of requests.

For example, uploading with upload_folder instead of upload_file will result in fewer requests.

mysocratesnote · April 23, 2025, 10:57am

I was using hugging-face-cli upload-large-folder

Is upload_folder better?

John6666 · April 23, 2025, 11:13am

The large version is still under development and is intended for cases where the size is truly large, so I think upload_folder is better if the total size is within 50 GB.

mysocratesnote · April 23, 2025, 11:31am

Does that mean:

huggingface-cli upload_folder username/repository /path/to/dataset --repo-type=dataset --num-workers=12

I’m getting that both upload_folder and upload-folder are invalid arguments.

valid choices {download,upload,repo-files,env,login,whoami,logout,auth,repo,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache,tag,version,upload-large-folder}

Thanks.

mysocratesnote · April 23, 2025, 11:49am

It works with

huggingface-cli upload mysocratesnote/jfk-files-text ~/Desktop/extracted_text/releases --repo-type=dataset

But it’s recommending I do it another way:

Consider using hf_transfer for faster uploads. This solution comes with some limitations. See Environment variables for more details.

It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use HfApi().upload_large_folder(...)/huggingface-cli upload-large-folder instead. For more details, check out Upload files to the Hub.

Start hashing 73480 files.

Finished hashing 73480 files.

mysocratesnote · April 23, 2025, 12:03pm

This failed shortly after it started with the ‘upload’ option.

File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/utils/_http.py”, line 409, in hf_raise_for_status
response.raise_for_status()
File “/Users/user/miniforge3/lib/python3.12/site-packages/requests/models.py”, line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/datasets/mysocratesnote/jfk-files-text/commit/main

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/Users/user/miniforge3/bin/huggingface-cli”, line 8, in
sys.exit(main())
^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/commands/huggingface_cli.py”, line 57, in main
service.run()
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/commands/upload.py”, line 206, in run
print(self._upload())
^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/commands/upload.py”, line 301, in _upload
return self.api.upload_folder(
^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py”, line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/hf_api.py”, line 1624, in _inner
return fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/hf_api.py”, line 4934, in upload_folder
commit_info = self.create_commit(
^^^^^^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py”, line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/hf_api.py”, line 1624, in _inner
return fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/hf_api.py”, line 4285, in create_commit
hf_raise_for_status(commit_resp, endpoint_name=“commit”)
File “/Users/user/miniforge3/lib/python3.12/site-packages/huggingface_hub/utils/_http.py”, line 482, in hf_raise_for_status
raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/datasets/mysocratesnote/jfk-files-text/commit/main

mysocratesnote · April 23, 2025, 12:09pm

It works with upload-large-folder for a little while, but even with --num-workers=2 it quickly hits a rate limit again.

Is there no way to upload and specify a rate under the limit?

John6666 · April 23, 2025, 1:16pm

That’s strange. I don’t think the numbers are high enough to cause an error… @Wauplin

github.com/huggingface/huggingface_hub

429 when uploading many files

opened 10:04PM - 01 Oct 24 UTC

closed 08:19AM - 15 Oct 24 UTC

fakerybakery

bug

### Describe the bug Hi, I'm using the experimental `upload-large-folder` feat…ure. I'm trying to upload ~2000 files, but I'm running into a 429 error. Would it be possible to increase the rate limit when using the `upload-large-folder` feature? Thanks! ### Reproduction _No response_ ### Logs _No response_ ### System info ```shell - huggingface_hub version: 0.25.1 - Python version: 3.12.5 - Running in iPython ?: No - Running in notebook ?: No - Running in Google Colab ?: No - Running in Google Colab Enterprise ?: No - Has saved token ?: True - Configured git credential helpers: store - FastAI: N/A - Tensorflow: N/A - Torch: 2.4.1 - Jinja2: 3.1.4 - Graphviz: N/A - keras: N/A - Pydot: N/A - Pillow: 10.4.0 - hf_transfer: 0.1.8 - gradio: N/A - tensorboard: N/A - numpy: 1.26.4 - pydantic: N/A - aiohttp: 3.10.8 - ENDPOINT: https://huggingface.co - HF_HUB_OFFLINE: False - HF_HUB_DISABLE_TELEMETRY: False - HF_HUB_DISABLE_PROGRESS_BARS: None - HF_HUB_DISABLE_SYMLINKS_WARNING: False - HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False - HF_HUB_DISABLE_IMPLICIT_TOKEN: False - HF_HUB_ENABLE_HF_TRANSFER: False - HF_HUB_ETAG_TIMEOUT: 10 - HF_HUB_DOWNLOAD_TIMEOUT: 10 ```

github.com/huggingface/huggingface_hub

upload_folder() fails to upload large datasets

opened 01:00PM - 15 Feb 24 UTC

WesleyHsieh0806

### Describe the bug I tried to use upload_folder to upload my dataset with ~36…0 GB image frames, but kept failing with the following error: `huggingface_hub.utils._errors.HfHubHTTPError: 413 Client Error: Payload Too Large for url` ### Steps to reproduce the bug The code for reference: For details regarding the `frames` folder structure, you can check [TAO](https://motchallenge.net/tao_download.php) ```python from huggingface_hub import upload_folder from huggingface_hub import login login() upload_folder( folder_path="TAO/frames", repo_id="chengyenhsieh/TAO-Amodal", path_in_repo="frames", repo_type="dataset", multi_commits=True, multi_commits_verbose=True, ) ``` ### Expected behavior Image frames are successfully updated to my Huggingface Dataset [Repo](https://huggingface.co/datasets/chengyenhsieh/TAO-Amodal) ### Environment info datasets 2.17.0 python 3.8

Wauplin · April 23, 2025, 1:31pm

Not a server-side hardcoded limit no but might be some technical issues. We are working on Fix dynamic commit size by maximizemaxwell · Pull Request #3016 · huggingface/huggingface_hub · GitHub to allow dynamic commit sizes which should help mitigate the issue.

John6666 · April 23, 2025, 2:11pm

Thank you!

mysocratesnote · April 23, 2025, 2:14pm

Not sure I understand all of this. Is there a way to upload at the “right speed” to avoid getting blocked? I’m on my third or fourth attempt. I can only upload a few thousand files without getting blocked. Even with a single worker, the rate limit keeps triggering. This is not a huge archive, it’s just a little over 1 GB. It’s taken about 4 hours just to upload about 1/3 of it.

Wauplin · April 23, 2025, 3:43pm

The problem is not the total size but the number of files (around 70k+ in total?). upload-large-folder was not meant for that at first (my bad, I designed it mostly to upload folders with hundreds of large files instead of folders with tens of thousands of small files). Result is that we are committing them by chunk of 50 which is making hundreds of commits which triggers the rate limit. Fix dynamic commit size by maximizemaxwell · Pull Request #3016 · huggingface/huggingface_hub · GitHub is meant as a good workaroudn for that but it’s not finished yet.

In the meantime I don’t have many suggestions except making the upload more manual (i.e. running huggingface-cli upload on subparts of the repo)

mysocratesnote · April 23, 2025, 3:45pm

This problem may be exacerbated by the fact that huggingface-cli keeps trying over and over after it already hit a 429 error. Ideally it would quit after getting that error a couple of times.

Wauplin · April 23, 2025, 3:48pm

Oooh, I did not notice that the files are been uploaded as regular markdown files. This means that all the data is stored in the git history, not using LFS files stored on S3. This is most certainly the culprit. Usually we try to avoid storing data as “raw” as it makes everything very slow. This is why git+LFS (and now git+xet) has been developed.

If that doesn’t make any sense to you, it basically means that the way files are stored on the repo is not optimized. I would recommend:

create a new separate repo
make sure the .md files are tracked as LFS (can be done by modifying the .gitattributes files .gitattributes · mysocratesnote/jfk-files-text at main)
upload files subpart by subpart (around 250 by 250 is good)
once everything is uploaded, delete the original repo and move the new one under the previous namespace

Very sorry about this situation but I think starting with a clean state is really needed here.

mysocratesnote · April 23, 2025, 3:49pm

It seems like it failed even faster when using ‘upload’ rather than ‘upload-large-folder’

When you say run it on parts of the repo one by one… OK but how do I ensure it’s uploaded to the right path?

If the repository looks like this:

├── 2017/              # 2017 release
│   ├── part_1/        # 2017 part 1 
│   ├── part_2/        # 2017 part 2
│   ├── part_3/        # 2017 part 3
│   ├── part_4/        # 2017 part 4
│   └── part_5/        # 2017 part 5 (originally labeled "additional")
├── 2018/              # 2018 release
│   ├── part_1/        # 2018 part 1
│   └── part_2/        # 2018 part 2

Do use huggingface-cli like this if I want to start with the 2017 subfolder?

huggingface-cli upload mysocratesnote/jfk-files-text/2017 ~/Desktop/extracted_text/releases/2017 --repo-type=dataset

Thanks.

Wauplin · April 23, 2025, 3:50pm

Other (better) solution is to store the data in a format that does not require to upload each file individually. Typically, it could .parquet files with columns like “date” “filename” “content” where each row is a markdown file. This way you will have only a few .parquet files to upload which will solve all of your problems. Also, it will enable the Dataset Studio for your repo.

mysocratesnote · April 23, 2025, 3:52pm

I think I figured that out. Thanks.

Topic		Replies	Views
Too many requests for URL 🤗Hub	5	3154	May 25, 2025
Getting 429 Error for sentence-transformers/all-mpnet-base-v2 Beginners	1	223	September 23, 2024
Problem loading HuggingFaceFW/fineweb-edu-score-2 dataset: Too Many Requests 🤗Datasets	1	73	March 22, 2025
Too Many Requests Error When Accessing Hugging Face API Without Authentication 🤗Hub	1	312	March 31, 2025
Failed to commit 504 Server Error Gateway Time-out for url Beginners	1	66	December 26, 2024

How to get around rate limits?

Related topics