When trying to load newly uploaded datasets using load_dataset('GBaker/lh_marked_sentence_coref')
I get the following error:
369 # Convert `HTTPError` into a `HfHubHTTPError` to display request information
370 # as well (request id and/or server error message)
--> 371 raise HfHubHTTPError(str(e), response=response) from e
372
373
HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/GBaker/lh_marked_sentence_coref (Request ID: Root=1-66a3e06f-398c9a801961e61137450cbf;cf0e4b4e-6943-4f1c-8d7c-e726a62a1350)
Internal Error - We're working hard to fix this as soon as possible!```
But I am able to load the dataset by cloning the repo,
```git clone https://huggingface.co/datasets/GBaker/lh_marked_sentence_coref.git```
and then calling `load_dataset` directly on the data directory.
I’m getting the same error, when I invoke load_dataset with my dataset.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
369 # Convert `HTTPError` into a `HfHubHTTPError` to display request information
370 # as well (request id and/or server error message)
--> 371 raise HfHubHTTPError(str(e), response=response) from e
372
373
HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/neoneye/simon-arc-shape-v4-rev3 (Request ID: Root=1-66a3f20f-2e729b990853409076fedd4a;dee61d3a-4d7c-4a6b-b200-745b903de953)
Internal Error - We're working hard to fix this as soon as possible!
# limit determines how many documents will be streamed (remove for all)
# to fetch a specific dump: hf://datasets/HuggingFaceFW/fineweb/data/CC-MAIN-2024-10
# replace "data" with "sample/100BT" to use the 100BT sample
data_reader = ParquetReader("hf://datasets/HuggingFaceFW/fineweb/sample/10BT", limit=10)
for document in data_reader():
# do something with document
print(document)
Same error when pulling monology/pile-uncopyrighted
HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/monology/pile-uncopyrighted (Request ID: Root=1-66a4591f-064984386142166a20871761;bf68f7b5-3022-4773-b476-a7410cf305b3)
Internal Error - We're working hard to fix this as soon as possible!
(it was an internal bug due to a modification of our dataset tagging system, that was recently updated to detect datasets compatibility with a famous Rust-based DataFrame library - sorry again !)