HTTP 400 on push_to_hub with datasets

I am trying to push a simple json dataset to the hub…

from datasets import load_dataset
dataset = load_dataset('json', data_files={"train": "aps/scirepeval/train/pub_year/train.jsonl", "validation": "aps/scirepeval/train/pub_year/val.jsonl","evaluation":"aps/scirepeval/test/pub_year/meta.jsonl"})
dataset.push_to_hub("aps6992/pub_year", private=True, token=<token>)

But everytime I get an HTTP 400 Client error…

File "a.py", line 3, in <module> dataset.push_to_hub("aps6992/pub_year", private=True, token='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') File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/datasets/dataset_dict.py", line 1350, in push_to_hub repo_id, split, uploaded_size, dataset_nbytes, _, _ = self[split]._push_parquet_shards_to_hub( File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 4195, in _push_parquet_shards_to_hub _retry( File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 284, in _retry raise err File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 281, in _retry return func(*func_args, **func_kwargs) File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1967, in upload_file pr_url = self.create_commit( File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1844, in create_commit _raise_for_status(commit_resp) File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 84, in _raise_for_status _raise_with_request_id(request) File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 95, in _raise_with_request_id raise e File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 90, in _raise_with_request_id request.raise_for_status() File "/net/nfs.cirrascale/s2-research/aps/miniconda3/envs/scigpt/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/datasets/aps6992/pub_year/commit/main (Request ID: UC1RTCHxd43a1AixsIzDg)

2 Likes

I have the exact same issue with json datasets

Hey, try updating the huggingface_hub library to >= 0.9. That should help you debug the issue.
pip uninstall huggingface_hub && pip install huggingface_hub