Download only 1 of many parquet file

Hi all,

Just wondering if there is a way to download just 1 (or 2) of the parquet files that are available.

This answer suggests using streaming, but wondering if there was a different way to do this too. For example one answer that has potential but isn’t working for me is:

import datasets
import config

if __name__ == "__main__":
    hyper_parameters = config.DataConfig()

    dataset = datasets.load_dataset(
        "Multimodal-Fatima/COCO_captions_train",
        cache_dir=config.IMAGE_DOWNLOAD_PATH,
        data_files={"train": "data/train-00000-of-00038-757e7d149500e41c.parquet"},
    )
    print(len(dataset["train"]))

which gives the error

Generating train split:   3%|██▊                                                                                                          | 2982/113287 [00:00<00:14, 7393.63 examples/s]
Traceback (most recent call last):
  File "/Users/sachinthakaabeywardana/personal_work/tiny_captions/src/download.py", line 7, in <module>
    dataset = datasets.load_dataset(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/datasets/load.py", line 2582, in load_dataset
    builder_instance.download_and_prepare(
  File "/opt/homebrew/lib/python3.12/site-packages/datasets/builder.py", line 1005, in download_and_prepare
    self._download_and_prepare(
  File "/opt/homebrew/lib/python3.12/site-packages/datasets/builder.py", line 1118, in _download_and_prepare
    verify_splits(self.info.splits, split_dict)
  File "/opt/homebrew/lib/python3.12/site-packages/datasets/utils/info_utils.py", line 101, in verify_splits
    raise NonMatchingSplitsSizesError(str(bad_splits))
datasets.utils.info_utils.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=18595506212.0, num_examples=113287, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='train', num_bytes=481592719, num_examples=2982, shard_lengths=None, dataset_name='coco_captions_train')}]

using the latest version of datasets should fix the issue