I have a folder containing 20,000 CSVs. the folder is 8 GB in total. I use a python script to use HF API to upload the folder to the Hub as a Dataset.
This is the code i use to upload the folder :
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
folder_path=f"{os.getcwd()}",
repo_id="FDSRashid/Tarafs_Embedded",
repo_type="dataset",
token = token_HF
)
The python script is located in the same folder as CSV files, which is why i use os.getcwd(). I get the following error:
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:2427)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/datasets/FDSRashid/Tarafs_Embedded/commit/main (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\rashi\Hadith_Embed\merge_tarafs.py", line 29, in <module>
api.upload_folder(
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 1208, in _inner
return fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 4598, in upload_folder
commit_info = self.create_commit(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 1208, in _inner
return fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\hf_api.py", line 3599, in create_commit
commit_resp = get_session().post(url=commit_url, headers=headers, data=data, params=params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\huggingface_hub\utils\_http.py", line 67, in send
return super().send(request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rashi\HF_Hub\Lib\site-packages\requests\adapters.py", line 517, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/datasets/FDSRashid/Tarafs_Embedded/commit/main (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))"), '(Request ID: 96ad289d-092e-4c3d-8642-18344650ad34)')
Is there a way to use Hugging Face APi to upload a folder with a lot of files to a HF dataset? If not, what should i use in my scenario?