Max Retries Exceeded when Uploading Folder to Hub

I have a folder containing 20,000 CSVs. the folder is 8 GB in total. I use a python script to use HF API to upload the folder to the Hub as a Dataset.
This is the code i use to upload the folder :

from huggingface_hub import HfApi
api = HfApi()

api.upload_folder(
    folder_path=f"{os.getcwd()}",
    repo_id="FDSRashid/Tarafs_Embedded",
    repo_type="dataset",
    token = token_HF
)

The python script is located in the same folder as CSV files, which is why i use os.getcwd(). I get the following error:

urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:2427)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/datasets/FDSRashid/Tarafs_Embedded/commit/main (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\rashi\Hadith_Embed\merge_tarafs.py", line 29, in <module>
    api.upload_folder(
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 1208, in _inner
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 4598, in upload_folder
    commit_info = self.create_commit(
                  ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 1208, in _inner
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 3599, in create_commit
    commit_resp = get_session().post(url=commit_url, headers=headers, data=data, params=params)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\utils\_http.py", line 67, in send
    return super().send(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/datasets/FDSRashid/Tarafs_Embedded/commit/main (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))"), '(Request ID: 96ad289d-092e-4c3d-8642-18344650ad34)')

Is there a way to use Hugging Face APi to upload a folder with a lot of files to a HF dataset? If not, what should i use in my scenario?

Hi ! The HF Hub only supports having maximum 10k files per folder. This limitation exists to make sure every dataset has the best performance when it comes to e.g. fetching files or downloading them locally.

You should be able to upload your dataset if you use multiple folders.
(maybe named part0, part1 etc. for example)

Also you might be interested in how to upload a folder by chunks and the tips and tricks for large uploads :slight_smile:

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.