HfHubHTTPError: 500 Server Error: Internal Server Error for url:

When trying to load newly uploaded datasets using load_dataset('GBaker/lh_marked_sentence_coref')

I get the following error:

    369         # Convert `HTTPError` into a `HfHubHTTPError` to display request information
    370         # as well (request id and/or server error message)
--> 371         raise HfHubHTTPError(str(e), response=response) from e
    372 
    373 

HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/GBaker/lh_marked_sentence_coref (Request ID: Root=1-66a3e06f-398c9a801961e61137450cbf;cf0e4b4e-6943-4f1c-8d7c-e726a62a1350)

Internal Error - We're working hard to fix this as soon as possible!```

But I am able to load the dataset by cloning the repo,
```git clone https://huggingface.co/datasets/GBaker/lh_marked_sentence_coref.git```
and then calling `load_dataset` directly on the data directory.
1 Like

I’m getting the same error, when I invoke load_dataset with my dataset.

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
    369         # Convert `HTTPError` into a `HfHubHTTPError` to display request information
    370         # as well (request id and/or server error message)
--> 371         raise HfHubHTTPError(str(e), response=response) from e
    372 
    373 
HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/neoneye/simon-arc-shape-v4-rev3 (Request ID: Root=1-66a3f20f-2e729b990853409076fedd4a;dee61d3a-4d7c-4a6b-b200-745b903de953)
Internal Error - We're working hard to fix this as soon as possible!

Browser

When I access the api in the browser:
https://huggingface.co/api/datasets/neoneye/simon-arc-shape-v4-rev3

{"error":"Internal Error - We're working hard to fix this as soon as possible!"}

Request headers

Accept	text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Encoding	gzip, deflate, br, zstd
Accept-Language	en-US,en;q=0.5
Connection	keep-alive
Host	huggingface.co
Priority	u=1
Sec-Fetch-Dest	document
Sec-Fetch-Mode	navigate
Sec-Fetch-Site	cross-site
Upgrade-Insecure-Requests	1
User-Agent	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:127.0) Gecko/20100101 Firefox/127.0

Response headers

X-Firefox-Spdy	h2
access-control-allow-origin	https://huggingface.co
access-control-expose-headers	X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range
content-length	80
content-type	application/json; charset=utf-8
cross-origin-opener-policy	same-origin
date	Fri, 26 Jul 2024 19:09:45 GMT
etag	W/"50-9qrwU+BNI4SD0Fe32p/nofkmv0c"
referrer-policy	strict-origin-when-cross-origin
vary	Origin
via	1.1 1624c79cd07e6098196697a6a7907e4a.cloudfront.net (CloudFront)
x-amz-cf-id	SP8E7n5qRaP6i9c9G83dNAiOzJBU4GXSrDRAcVNTomY895K35H0nJQ==
x-amz-cf-pop	CPH50-C1
x-cache	Error from cloudfront
x-error-message	Internal Error - We're working hard to fix this as soon as possible!
x-powered-by	huggingface-moon
x-request-id	Root=1-66a3f479-026417465ef42f49349fdca1

Same here. I also ran into this error when try to access my dataset via API.

1 Like

Same error. Please look into it.
Code


# limit determines how many documents will be streamed (remove for all)
# to fetch a specific dump: hf://datasets/HuggingFaceFW/fineweb/data/CC-MAIN-2024-10
# replace "data" with "sample/100BT" to use the 100BT sample
data_reader = ParquetReader("hf://datasets/HuggingFaceFW/fineweb/sample/10BT", limit=10) 
for document in data_reader():
    # do something with document
    print(document)
1 Like

Same problem since this morning, cannot access any newly uploaded datasets.

Same error here,. Please help

same error now

Same error when pulling monology/pile-uncopyrighted

HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/monology/pile-uncopyrighted (Request ID: Root=1-66a4591f-064984386142166a20871761;bf68f7b5-3022-4773-b476-a7410cf305b3)

Internal Error - We're working hard to fix this as soon as possible!

I’m getting the same error. I tried

  • deleting the dataset and create it again
  • uploading it to a different organisation
  • changing the data format fromparquet to arrow
  • recreating the access token that I’m using for HfApi

but none of them solved the issue

1 Like

Me too.
When I’m iterating, this happens and ruins 30 hours of training: …

1 Like

I have created an issue on the github datasets repo, that links back to this discussion.

1 Like

one alternative worked for me :
`
!git clone yourdataset_path

dataset = load_dataset(“parquet”, data_dir=“yourdataset_local_path”)
`

1 Like

I don’t understand what repo to clone. Lets say for ‘tasksource/icl-symbol-tuning-instruct · Datasets at Hugging Face’, what do I clone?

Thanks mate !
worked out for me also… seems okay when dataset is local…

Hi, you had to clone the whole url :
!git clone tasksource/icl-symbol-tuning-instruct · Datasets at Hugging Face

1 Like

Thanks for the help. This worked

from datasets import load_dataset

#!git clone tasksource/icl-symbol-tuning-instruct · Datasets at Hugging Face

ds = load_dataset(‘parquet’, data_dir=‘icl-symbol-tuning-instruct’)

we’re working on a fix ! :slight_smile:

We fixed the issue, you can load datasets again, sorry for the inconvenience !

1 Like

(it was an internal bug due to a modification of our dataset tagging system, that was recently updated to detect datasets compatibility with a famous Rust-based DataFrame library - sorry again !)

3 Likes

Hello! I am still seeing this issue on some datasets (https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML).

1 Like