I am trying to push my audio dataset to huggingface, by using the function Dataset.push_to_hub() but I got some problems with the following error:
huggingface_hub.utils._errors.BadRequestError: (Request ID: Root=1-66e44a34-2265d8dd5f0712e1239094bc;329d29d8-f6cd-4121-9cef-3848a280d540)
Bad request:
Your proposed upload is smaller than the minimum allowed size
I want to talk a bit about my context, hope this helpful. In my situation, before pushing my audio dataset to huggingface, I need to do some processing to my dataset. Due to the large volume of my dataset, I can not process the whole dataset at the same time by using Datasets.map()
. Therefore, what I did is as follows
- First, I splitted my dataset into 25 parts (smaller datasets)
- Next, I applied
Dataset.map()
for each of these 25 parts in sequential order (each time one part), then I save each part to my disk byDataset.save_to_disk()
after having processed that part. - After having processed and saved completely all the 25 parts, I load each of these parts by using
datasets.load_from_disk()
, and I usedatasets.concatenate_datasets()
to concatenate the 25 parts to a whole dataset. Finally, I use the functiondatasets.Dataset.push_to_hub()
for the whole dataset in order to push to huggingface.
Below is my full error.
HTTP Error 500 thrown while requesting PUT https://hf-hub-lfs-us-east-1.s3-accelerate.amazonaws.com/repos/5f/7d/5f7dbc96f79ad1a3b092e972838e58d6cec745f81d1bd787f85
c37b48b90c8c2/576e4005b5869a14535512c18922e5b14c5a9eb9ce14e328136abc1cb7eb7807?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credent
ial=AKIA2JU7TKAQLC2QXPN7%2F20240913%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240913T154625Z&X-Amz-Expires=86400&X-Amz-Signature=d484963473b954473d6d730a82d7738
b0df651743bda363dcac3202e3110a827&X-Amz-SignedHeaders=host&partNumber=2&uploadId=knKPR0sUudlf6QzR8gPORZB0WwDRC.L_JMSGlFU6N1TVdHz2Om9VHwbQYCECNPhQ0Qhs4VLgSKT9qKvvNE
8ozxIJLmNvXdcuEFevTghmA5Tlbk.0T2XXdQWy_6.cJXug&x-id=UploadPart
Retrying in 1s [Retry 1/5].
Uploading the dataset shards: 59%|███████████████████████████████████████████████████████ | 253/432 [58:46<41:35, 13.94s/it]
Traceback (most recent call last):
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/haons/.local/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/complete_multipart?uploadId=knKPR0sUudlf6QzR8gPORZB0WwDRC.L_JMSGlF
U6N1TVdHz2Om9VHwbQYCECNPhQ0Qhs4VLgSKT9qKvvNE8ozxIJLmNvXdcuEFevTghmA5Tlbk.0T2XXdQWy_6.cJXug&bucket=hf-hub-lfs-us-east-1&prefix=repos%2F5f%2F7d%2F5f7dbc96f79ad1a3b09
2e972838e58d6cec745f81d1bd787f85c37b48b90c8c2&expiration=Sat%2C+14+Sep+2024+15%3A46%3A25+GMT&signature=0a7a2ab7291b25f07e26d920f27340762b86ade8b3ed0d46057da74f9b2a
e6e4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/haons/.local/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/complete_multipart?uploadId=knKPR0sUudlf6QzR8gPORZB0WwDRC.L_JMSGlF
U6N1TVdHz2Om9VHwbQYCECNPhQ0Qhs4VLgSKT9qKvvNE8ozxIJLmNvXdcuEFevTghmA5Tlbk.0T2XXdQWy_6.cJXug&bucket=hf-hub-lfs-us-east-1&prefix=repos%2F5f%2F7d%2F5f7dbc96f79ad1a3b09
2e972838e58d6cec745f81d1bd787f85c37b48b90c8c2&expiration=Sat%2C+14+Sep+2024+15%3A46%3A25+GMT&signature=0a7a2ab7291b25f07e26d920f27340762b86ade8b3ed0d46057da74f9b2a
e6e4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 431, in _wrapped_lfs_upload
lfs_upload(operation=operation, lfs_batch_action=batch_action, headers=headers, endpoint=endpoint)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/lfs.py", line 246, in lfs_upload
_upload_multi_part(operation=operation, header=header, chunk_size=chunk_size, upload_url=upload_url)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/lfs.py", line 355, in _upload_multi_part
hf_raise_for_status(completion_res)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError: (Request ID: Root=1-66e45e67-50837d1e4ccce5d6497cf9bd;68743be0-430b-4375-803b-ac20d176835c)
Bad request:
Your proposed upload is smaller than the minimum allowed size
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home4/haons/speaker-verification/push_data_to_huggingface/push_data_vsasv_to_huggingface/src/main.py", line 24, in <module>
dataset.push_to_hub(huggingface_dataset, token= TOKEN)
File "/home/haons/.local/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 5414, in push_to_hub
additions, uploaded_size, dataset_nbytes = self._push_parquet_shards_to_hub(
File "/home/haons/.local/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 5262, in _push_parquet_shards_to_hub
api.preupload_lfs_files(
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 4317, in preupload_lfs_files
_upload_lfs_files(
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 441, in _upload_lfs_files
_wrapped_lfs_upload(filtered_actions[0])
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 433, in _wrapped_lfs_upload
raise RuntimeError(f"Error while uploading '{operation.path_in_repo}' to the Hub.") from exc
RuntimeError: Error while uploading 'data/train-00253-of-00432.parquet' to the Hub.
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/haons/.local/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/complete_multipart?uploadId=knKPR0sUudlf6QzR8gPORZB0WwDRC.L_JMSGlF
U6N1TVdHz2Om9VHwbQYCECNPhQ0Qhs4VLgSKT9qKvvNE8ozxIJLmNvXdcuEFevTghmA5Tlbk.0T2XXdQWy_6.cJXug&bucket=hf-hub-lfs-us-east-1&prefix=repos%2F5f%2F7d%2F5f7dbc96f79ad1a3b09
2e972838e58d6cec745f81d1bd787f85c37b48b90c8c2&expiration=Sat%2C+14+Sep+2024+15%3A46%3A25+GMT&signature=0a7a2ab7291b25f07e26d920f27340762b86ade8b3ed0d46057da74f9b2a
e6e4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 431, in _wrapped_lfs_upload
lfs_upload(operation=operation, lfs_batch_action=batch_action, headers=headers, endpoint=endpoint)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/lfs.py", line 246, in lfs_upload
_upload_multi_part(operation=operation, header=header, chunk_size=chunk_size, upload_url=upload_url)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/lfs.py", line 355, in _upload_multi_part
hf_raise_for_status(completion_res)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 358, in hf_raise_for_status
raise BadRequestError(message, response=response) from e
huggingface_hub.utils._errors.BadRequestError: (Request ID: Root=1-66e45e67-50837d1e4ccce5d6497cf9bd;68743be0-430b-4375-803b-ac20d176835c)
Bad request:
Your proposed upload is smaller than the minimum allowed size
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home4/haons/speaker-verification/push_data_to_huggingface/push_data_vsasv_to_huggingface/src/main.py", line 24, in <module>
dataset.push_to_hub(huggingface_dataset, token= TOKEN)
File "/home/haons/.local/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 5414, in push_to_hub
additions, uploaded_size, dataset_nbytes = self._push_parquet_shards_to_hub(
File "/home/haons/.local/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 5262, in _push_parquet_shards_to_hub
api.preupload_lfs_files(
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 4317, in preupload_lfs_files
_upload_lfs_files(
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 441, in _upload_lfs_files
_wrapped_lfs_upload(filtered_actions[0])
File "/home/haons/.local/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 433, in _wrapped_lfs_upload
raise RuntimeError(f"Error while uploading '{operation.path_in_repo}' to the Hub.") from exc
RuntimeError: Error while uploading 'data/train-00253-of-00432.parquet' to the Hub.
Thanks in advance for your help. This is my first post on huggingface discuss forum, so if there is any error, please announce me ! Thanks again for your consideration !