### Describe the bug
It's weird. I could not normally connect the dataset H…ub of HuggingFace due to a SSLError in my office.
Even when I try to connect using my company's proxy address (e.g., http_proxy and https_proxy),
I'm getting the SSLError issue. What should I do to download the datanet stored in HuggingFace normally?
I welcome any comments. I think those comments will be helpful to me.
* Dataset address - https://huggingface.co/datasets/moyix/debian_csrc/viewer/moyix--debian_csrc
* Log message
```
............ OMISSION ..............
Traceback (most recent call last):
File "/data/home/geunsik-lim/qtlab/./transformers/examples/pytorch/language-modeling/run_clm.py", line 587, in <module>
main()
File "/data/home/geunsik-lim/qtlab/./transformers/examples/pytorch/language-modeling/run_clm.py", line 278, in main
raw_datasets = load_dataset(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1719, in load_dataset
builder_instance = load_dataset_builder(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1497, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1222, in dataset_module_factory
raise e1 from None
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1179, in dataset_module_factory
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).__name__})")
ConnectionError: Couldn't reach 'moyix/debian_csrc' on the Hub (SSLError)
[2022-11-07 15:23:38,476] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 6760
[2022-11-07 15:23:38,476] [ERROR] [launch.py:324:sigkill_handler] ['/home/geunsik-lim/anaconda3/envs/deepspeed/bin/python', '-u', './transformers/examples/pytorch/language-modeling/run_clm.py', '--local_rank=0', '--model_name_or_path=Salesforce/codegen-350M-multi', '--per_device_train_batch_size=1', '--learning_rate', '2e-5', '--num_train_epochs', '1', '--output_dir=./codegen-350M-finetuned', '--overwrite_output_dir', '--dataset_name', 'moyix/debian_csrc', '--cache_dir', '/data/home/geunsik-lim/.cache', '--tokenizer_name', 'Salesforce/codegen-350M-multi', '--block_size', '2048', '--gradient_accumulation_steps', '32', '--do_train', '--fp16', '--deepspeed', 'ds_config_zero2.json'] exits with return code = 1
real 0m7.742s
user 0m4.930s
```
### Steps to reproduce the bug
Steps to reproduce this behavior.
```
(deepspeed) geunsik-lim@ai02:~/qtlab$ ./test_debian_csrc_dataset.py
Traceback (most recent call last):
File "/data/home/geunsik-lim/qtlab/./test_debian_csrc_dataset.py", line 6, in <module>
dataset = load_dataset("moyix/debian_csrc")
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1719, in load_dataset
builder_instance = load_dataset_builder(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1497, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1222, in dataset_module_factory
raise e1 from None
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1179, in dataset_module_factory
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).__name__})")
ConnectionError: Couldn't reach 'moyix/debian_csrc' on the Hub (SSLError)
(deepspeed) geunsik-lim@ai02:~/qtlab$
(deepspeed) geunsik-lim@ai02:~/qtlab$
(deepspeed) geunsik-lim@ai02:~/qtlab$
(deepspeed) geunsik-lim@ai02:~/qtlab$ cat ./test_debian_csrc_dataset.py
#!/usr/bin/env python
from datasets import load_dataset
dataset = load_dataset("moyix/debian_csrc")
```
1. Adde proxy address of a company in /etc/profile
2. Download dataset with load_dataset() function of datasets package that is provided by HuggingFace.
3. In this case, the address would be "moyix--debian_csrc".
4. I get the "`ConnectionError: Couldn't reach 'moyix/debian_csrc' on the Hub (SSLError`)" error message.
### Expected behavior
* error message:
ConnectionError: Couldn't reach 'moyix/debian_csrc' on the Hub (SSLError)
### Environment info
* software version information:
```
(deepspeed) geunsik-lim@ai02:~$
(deepspeed) geunsik-lim@ai02:~$ conda list -f pytorch
# packages in environment at /home/geunsik-lim/anaconda3/envs/deepspeed:
#
# Name Version Build Channel
pytorch 1.13.0 py3.10_cuda11.7_cudnn8.5.0_0 pytorch
(deepspeed) geunsik-lim@ai02:~$ conda list -f python
# packages in environment at /home/geunsik-lim/anaconda3/envs/deepspeed:
#
# Name Version Build Channel
python 3.10.6 haa1d7c7_1
(deepspeed) geunsik-lim@ai02:~$ conda list -f datasets
# packages in environment at /home/geunsik-lim/anaconda3/envs/deepspeed:
#
# Name Version Build Channel
datasets 2.6.1 py_0 huggingface
(deepspeed) geunsik-lim@ai02:~$ uname -a
Linux ai02 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
(deepspeed) geunsik-lim@ai02:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
```