Hi, Im streaming laion2b dataset using:
self.dataset = load_dataset("laion/laion2b-en", streaming=True,split="train")
And Im getting this error:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out.
This is not he interesting part, whats interesting is that it worked for two weeks in a row, out of no where, the streaming stopped and now I cant run (getting error above).
My network manager says nothing changed in the configuration/proxy or anything else, did something change from the “datasets” package side?
The full trace is:
File "/workspace/dir/dir_env/lib/python3.8/site-packages/datasets/load.py", line 1502, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/workspace/dir/dir_env/lib/python3.8/site-packages/datasets/load.py", line 1219, in dataset_module_factory
raise e1 from None
File "/workspace/dir/dir_env/lib/python3.8/site-packages/datasets/load.py", line 1186, in dataset_module_factory
raise e
File "/workspace/dir/dir_env/lib/python3.8/site-packages/datasets/load.py", line 1160, in dataset_module_factory
dataset_info = hf_api.dataset_info(
File "/workspace/dir/dir_env/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/workspace/dir/dir_env/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1666, in dataset_info
r = get_session().get(path, headers=headers, timeout=timeout, params=params)
File "/workspace/dir/dir_env/lib/python3.8/site-packages/requests/sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "/workspace/dir/dir_env/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/workspace/dir/dir_env/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/workspace/dir/dir_env/lib/python3.8/site-packages/requests/adapters.py", line 532, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=100.0)