Hugging face error equests.exceptions.ConnectionError: (ProtocolError('Connection aborted.' how to fix?

I’m getting the following mysterious error in hugging face:

3345 During handling of the above exception, another exception occurred:
3346 Traceback (most recent call last):
3347   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
3348     resp = conn.urlopen(
3349   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen
3350     retries = retries.increment(
3351   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
3352     raise six.reraise(type(error), error, _stacktrace)
3353   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
3354     raise value.with_traceback(tb)
3355   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
3356     httplib_response = self._make_request(
3357   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
3358     six.raise_from(e, None)
3359   File "<string>", line 3, in raise_from
3360   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request
3361     httplib_response = conn.getresponse()
3362   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 1375, in getresponse
3363     response.begin()
3364   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 318, in begin
3365     version, status, reason = self._read_status()
3366   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 287, in _read_status
3367     raise RemoteDisconnected("Remote end closed connection without"
3368 urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
3369 During handling of the above exception, another exception occurred:
3370 Traceback (most recent call last):
3371   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 591, in <module>
3372     print(f'{all_columns=}')
3373   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 553, in experiment_compute_diveristy_coeff_single_dataset_then_combined_datasets_with_domain_weights
3374   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 64, in get_diversity_coefficient
3375     embedding, loss = Task2Vec(probe_network, classifier_opts={'seed': seed}).embed(tokenized_batch)
3376   File "/afs/cs.stanford.edu/u/brando9/beyond-scale-language-data-diversity/src/diversity/task2vec.py", line 133, in embed
3377     loss = self._finetune_classifier(dataset, loader_opts=self.loader_opts, classifier_opts=self.classifier_opts, max_samples=self.max_samples, epochs=epochs)
3378   File "/afs/cs.stanford.edu/u/brando9/beyond-scale-language-data-diversity/src/diversity/task2vec.py", line 198, in _finetune_classifier
3379     for step, batch in enumerate(epoch_iterator):
3380   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
3381     for obj in iterable:
3382   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
3383     data = self._next_data()
3384   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
3385     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
3386   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
3387     data.append(next(self.dataset_iter))
3388   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__
3389     for key, example in ex_iterable:
3390   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 652, in __iter__
3391     yield from self._iter()
3392   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 667, in _iter
3393     for key, example in iterator:
3394   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1088, in __iter__
3395     for key, example in self.ex_iterable:
3396   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1013, in __iter__
3397     yield from islice(self.ex_iterable, self.n)
3398   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 400, in __iter__
3399     yield next(iterators[i])
3400   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 73, in __next__
3401     result = next(self.it)
3402   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 652, in __iter__
3403     yield from self._iter()
3404   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 714, in _iter
3405     for key, example in iterator:
3406   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1088, in __iter__
3407     for key, example in self.ex_iterable:
3408   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 255, in __iter__
3409     for key, pa_table in self.generate_tables_fn(**self.kwargs):
3410   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/packaged_modules/parquet/parquet.py", line 77, in _generate_tables
3411     parquet_file = pq.ParquetFile(f)
3412   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 334, in __init__
3413     self.reader.open(
3414   File "pyarrow/_parquet.pyx", line 1220, in pyarrow._parquet.ParquetReader.open
3415   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/download/streaming_download_manager.py", line 333, in read_with_retries
3416     out = read(*args, **kwargs)
3417   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/fsspec/spec.py", line 1790, in read
3418     out = self.cache._fetch(self.loc, self.loc + length)
3419   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/fsspec/caching.py", line 156, in _fetch
3420     self.cache = self.fetcher(start, end)  # new block replaces old
3421   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/huggingface_hub/hf_file_system.py", line 404, in _fetch_range
3422     r = http_backoff("GET", url, headers=headers)
3423   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 258, in http_backoff
3424     response = session.request(method=method, url=url, **kwargs)
3425   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
3426     resp = self.send(prep, **send_kwargs)
3427   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
3428     r = adapter.send(request, **kwargs)
3429   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 63, in send
3430     return super().send(request, *args, **kwargs)
3431   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
3432     raise ConnectionError(err, request=request)
3433 requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: d98c4e56-198a-40cb-a53e-9e1348ea1c58)')

How does one solve this issue in the context of hugging face?

-cross so: huggingface - Hugging face error equests.exceptions.ConnectionError: (ProtocolError('Connection aborted.' how to fix? - Stack Overflow

Due to hugging face data sets disappearing I’ve had to get the data from their data viewer using the parquet option. But when I try to run it there some sort of HHTP error. I’ve tried downloading the data but I can’t. What is the recommended way to solve this problem?

Partial code (and full code):

    # - 5 subsets of the pile interleaved
    # from diversity.pile_subset_urls import urls_hacker_news, urls_nih_exporter, urls_pubmed, urls_uspto
    # from diversity.data_mixtures import get_uniform_data_mixture_5subsets_of_pile, get_doremi_data_mixture_5subsets_of_pile, get_llama_v1_data_mixtures_5subsets_of_pile
    # path, name, data_files, split = ['suolyer/pile_pile-cc'] + ['parquet'] * 4, [None] + ['hacker_news', 'nih_exporter', 'pubmed', 'uspto'], [None] + [urls_hacker_news, urls_nih_exporter, urls_pubmed, urls_uspto], ['validation'] + ['train'] * 4
    # ## path, name, data_files = ['conceptofmind/pile_cc'] + ['parquet'] * 4, ['sep_ds'] + ['hacker_news', 'nih_exporter', 'pubmed', 'uspto'], [None] + [urls_hacker_news, urls_nih_exporter, urls_pubmed, urls_uspto]
    # # probabilities, data_mixture_name = get_uniform_data_mixture_5subsets_of_pile()
    # # probabilities, data_mixture_name = get_llama_v1_data_mixtures_5subsets_of_pile(name)
    # probabilities, data_mixture_name = get_doremi_data_mixture_5subsets_of_pile(name)
    # - probe net
    pretrained_model_name_or_path = 'meta-llama/Llama-2-7b-hf'
    # - not changing
    batch_size = 512
    today = datetime.datetime.now().strftime('%Y-m%m-d%d-t%Hh_%Mm_%Ss')
    run_name = f'{path} div_coeff_{num_batches=} ({today=} ({name=}) {data_mixture_name=} {probabilities=} {pretrained_model_name_or_path=})'
    print(f'\n---> {run_name=}\n')

    # - Init wandb
    debug: bool = mode == 'dryrun'
    run = wandb.init(mode=mode, project="beyond-scale", name=run_name, save_code=True)
    wandb.config.update({"num_batches": num_batches, "path": path, "name": name, "today": today, 'probabilities': probabilities, 'batch_size': batch_size, 'debug': debug, 'data_mixture_name': data_mixture_name, 'streaming': streaming, 'data_files': data_files, 'seed': seed, 'pretrained_model_name_or_path': pretrained_model_name_or_path})
    # run.notify_on_failure() # https://community.wandb.ai/t/how-do-i-set-the-wandb-alert-programatically-for-my-current-run/4891
    print(f'{debug=}')
    print(f'{wandb.config=}')

    # -- Get probe network
    from datasets import load_dataset 
    from datasets.iterable_dataset import IterableDataset
    import torch
    from transformers import GPT2Tokenizer, GPT2LMHeadModel

    # tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    # if tokenizer.pad_token_id is None:
    #     tokenizer.pad_token = tokenizer.eos_token
    # probe_network = GPT2LMHeadModel.from_pretrained("gpt2")
    # device = torch.device(f"cuda:{0}" if torch.cuda.is_available() else "cpu")
    # probe_network = probe_network.to(device)

    from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
    torch_dtype = torch.bfloat16
    torch_dtype = torch.float32
    bf16=torch.cuda.get_device_capability(torch.cuda.current_device())[0] >= 8,  # if >= 8 ==> brain float 16 available or set to True if you always want fp32
    model = AutoModelForCausalLM.from_pretrained(
        pretrained_model_name_or_path,
        # quantization_config=quantization_config,
        # device_map=device_map,  # device_map = None  https://github.com/huggingface/trl/blob/01c4a35928f41ba25b1d0032a085519b8065c843/examples/scripts/sft_trainer.py#L82
        trust_remote_code=True,
        torch_dtype=torch_dtype,
        use_auth_token=True,
    )
    print(f'{pretrained_model_name_or_path=}')
    # https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L347C13-L347C13
    tokenizer = AutoTokenizer.from_pretrained(
        pretrained_model_name_or_path,
        # cache_dir=args.cache_dir,
        padding_side="right",
        use_fast=False, # Fast tokenizer giving issues.
        # tokenizer_type='llama' if 'llama' in args.model_name_or_path else None, # Needed for HF name change
        # tokenizer_type='llama',
        trust_remote_code=True,
        use_auth_token=True,
    )
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token = tokenizer.eos_token
    probe_network = model

    # -- Get data set
    def my_load_dataset(path, name, data_files=data_files, split=split):
        print(f'{path=} {name=} {streaming=} {data_files=}, {split=}')
        if path == 'json' or path == 'bin' or path == 'csv':
            print(f'{data_files_prefix+name=}')
            return load_dataset(path, data_files=data_files_prefix+name, streaming=streaming, split=split).with_format("torch")
        elif path == 'parquet':
            print(f'{data_files=}')
            return load_dataset(path, data_files=data_files, streaming=streaming, split=split).with_format("torch")

How to fix it? Ideas:

  1. Download the parquet data
  2. Stop the http error.

Error messages follow:

3345 During handling of the above exception, another exception occurred:
3346 Traceback (most recent call last):
3347   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
3348     resp = conn.urlopen(
3349   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen
3350     retries = retries.increment(
3351   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
3352     raise six.reraise(type(error), error, _stacktrace)
3353   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
3354     raise value.with_traceback(tb)
3355   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
3356     httplib_response = self._make_request(
3357   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
3358     six.raise_from(e, None)
3359   File "<string>", line 3, in raise_from
3360   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request
3361     httplib_response = conn.getresponse()
3362   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 1375, in getresponse
3363     response.begin()
3364   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 318, in begin
3365     version, status, reason = self._read_status()
3366   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 287, in _read_status
3367     raise RemoteDisconnected("Remote end closed connection without"
3368 urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
3369 During handling of the above exception, another exception occurred:
3370 Traceback (most recent call last):
3371   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 591, in <module>
3372     print(f'{all_columns=}')
3373   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 553, in experiment_compute_diveristy_coeff_single_dataset_then_combined_datasets_with_domain_weights
3374   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 64, in get_diversity_coefficient
3375     embedding, loss = Task2Vec(probe_network, classifier_opts={'seed': seed}).embed(tokenized_batch)
3376   File "/afs/cs.stanford.edu/u/brando9/beyond-scale-language-data-diversity/src/diversity/task2vec.py", line 133, in embed
3377     loss = self._finetune_classifier(dataset, loader_opts=self.loader_opts, classifier_opts=self.classifier_opts, max_samples=self.max_samples, epochs=epochs)
3378   File "/afs/cs.stanford.edu/u/brando9/beyond-scale-language-data-diversity/src/diversity/task2vec.py", line 198, in _finetune_classifier
3379     for step, batch in enumerate(epoch_iterator):
3380   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
3381     for obj in iterable:
3382   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
3383     data = self._next_data()
3384   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
3385     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
3386   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
3387     data.append(next(self.dataset_iter))
3388   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__
3389     for key, example in ex_iterable:
3390   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 652, in __iter__
3391     yield from self._iter()
3392   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 667, in _iter
3393     for key, example in iterator:
3394   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1088, in __iter__
3395     for key, example in self.ex_iterable:
3396   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1013, in __iter__
3397     yield from islice(self.ex_iterable, self.n)
3398   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 400, in __iter__
3399     yield next(iterators[i])
3400   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 73, in __next__
3401     result = next(self.it)
3402   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 652, in __iter__
3403     yield from self._iter()
3404   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 714, in _iter
3405     for key, example in iterator:
3406   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 1088, in __iter__
3407     for key, example in self.ex_iterable:
3408   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/iterable_dataset.py", line 255, in __iter__
3409     for key, pa_table in self.generate_tables_fn(**self.kwargs):
3410   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/packaged_modules/parquet/parquet.py", line 77, in _generate_tables
3411     parquet_file = pq.ParquetFile(f)
3412   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 334, in __init__
3413     self.reader.open(
3414   File "pyarrow/_parquet.pyx", line 1220, in pyarrow._parquet.ParquetReader.open
3415   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/datasets/download/streaming_download_manager.py", line 333, in read_with_retries
3416     out = read(*args, **kwargs)
3417   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/fsspec/spec.py", line 1790, in read
3418     out = self.cache._fetch(self.loc, self.loc + length)
3419   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/fsspec/caching.py", line 156, in _fetch
3420     self.cache = self.fetcher(start, end)  # new block replaces old
3421   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/huggingface_hub/hf_file_system.py", line 404, in _fetch_range
3422     r = http_backoff("GET", url, headers=headers)
3423   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 258, in http_backoff
3424     response = session.request(method=method, url=url, **kwargs)
3425   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
3426     resp = self.send(prep, **send_kwargs)
3427   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
3428     r = adapter.send(request, **kwargs)
3429   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 63, in send
3430     return super().send(request, *args, **kwargs)
3431   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
3432     raise ConnectionError(err, request=request)
3433 requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), '(Request ID: d98c4e56-198a-40cb-a53e-9e1348ea1c58)')

How does one solve this issue in the context of hugging face?


Error:

300   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
2301     httplib_response = self._make_request(
2302   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
2303     six.raise_from(e, None)
2304   File "<string>", line 3, in raise_from
2305   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request
2306     httplib_response = conn.getresponse()
2307   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 1375, in getresponse
2308     response.begin()
2309   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 318, in begin
2310     version, status, reason = self._read_status()
2311   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 287, in _read_status
2312     raise RemoteDisconnected("Remote end closed connection without"
2313 http.client.RemoteDisconnected: Remote end closed connection without response
2314 During handling of the above exception, another exception occurred:
2315 Traceback (most recent call last):
2316   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2317     resp = conn.urlopen(
2318   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen
2319     retries = retries.increment(
2320   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
2321     raise six.reraise(type(error), error, _stacktrace)
2322   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
2323     raise value.with_traceback(tb)
2324   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
2325     httplib_response = self._make_request(
2326   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
2327     six.raise_from(e, None)
2328   File "<string>", line 3, in raise_from
2329   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request
2330     httplib_response = conn.getresponse()
2331   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 1375, in getresponse
2332     response.begin()
2333   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 318, in begin
2334     version, status, reason = self._read_status()
2335   File "/lfs/hyperturing2/0/brando9/miniconda/envs/beyond_scale/lib/python3.10/http/client.py", line 287, in _read_status
2336     raise RemoteDisconnected("Remote end closed connection without"
2337 urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2338 During handling of the above exception, another exception occurred:
2339 Traceback (most recent call last):
2340   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 695, in <module>
2341     datasets = [dataset.remove_columns(columns_to_remove) for dataset in datasets]
2342   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 656, in experiment_compute_diveristy_coeff_single_dataset_then_combined_datasets_with_domain_weights
2343     return load_dataset(path, data_files=data_files_prefix+name, streaming=streaming, split=split).with_format("torch")
2344   File "/lfs/hyperturing2/0/brando9/beyond-scale-language-data-diversity/src/diversity/div_coeff.py", line 64, in get_diversity_coefficient
2345     embedding, loss = Task2Vec(probe_network, classifier_opts={'seed': seed}).embed(tokenized_batch)
2346   File "/afs/cs.stanford.edu/u/brando9/beyond-scale-language-data-diversity/src/diversity/task2vec.py", line 133, in embed
2347     loss = self._finetune_classifier(dataset, loader_opts=self.loader_opts, classifier_opts=self.classifier_opts, max_samples=self.max_samples, epochs=epochs)

cc @lhoestq @albertvillanova @polinaeterna @mariosasko

Could you share a minimal code to reproduce the issue ?

Also it might have been an issue with our server at that time and should be fixed by now

I can in a bit. Code is trivial. Get a HF dataset loop through it days in streaming mode. Eventually parquet data format usage fails

If I could simply download it all in one go from the data viewer would solve my problem

In the dataset viewer you can click on “Auto-converted to Parquet”

Capture d’écran 2023-09-26 à 10.12.59

It will send you to the directory of parquet files, that you can download manually.

Does it help in your case?