How to load local dataset

Hi

since my ubuntu machine can’t connect to huggingface, i have to load the dataset (Fill50K) locally. I first run the git clone https://huggingface.co/datasets/fusing/fill50k on my PC, then i upload it to my ubuntu machine and unzip the images. It has the following directory structure:

  • fill50k
    • fill50k.py
    • images
    • conditioning_images
    • train.jsonl
    • images.zip
    • conditioning_images.zip

I modify the _split_generators in fill50k.py

# METADATA_URL = hf_hub_url(
#     "fusing/fill50k",
#     filename="train.jsonl",
#     repo_type="dataset",
# )

# IMAGES_URL = hf_hub_url(
#     "fusing/fill50k",
#     filename="images.zip",
#     repo_type="dataset",
# )

# CONDITIONING_IMAGES_URL = hf_hub_url(
#     "fusing/fill50k",
#     filename="conditioning_images.zip",
#     repo_type="dataset",
# )


def _split_generators(self, dl_manager):
        # metadata_path = dl_manager.download(METADATA_URL)
        # images_dir = dl_manager.download_and_extract(IMAGES_URL)
        # conditioning_images_dir = dl_manager.download_and_extract(
        #     CONDITIONING_IMAGES_URL
        # )

        metadata_path = './train.jsonl'
        images_dir = './images'
        conditioning_images_dir = './conditioning_images'

        return [
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                # These kwargs will be passed to _generate_examples
                gen_kwargs={
                    "metadata_path": metadata_path,
                    "images_dir": images_dir,
                    "conditioning_images_dir": conditioning_images_dir,
                },
            ),
        ]

but when i run dataset = load_dataset('./fill50k'), it still download the files.

from datasets import load_dataset
dataset = load_dataset('./fill50k')

---------------------
No config specified, defaulting to: fill50k/default
Downloading and preparing dataset fill50k/default to /home/xxx/.cache/huggingface/datasets/fill50k/default

I want to know how to solve this, thanks.

What makes you think it downloads files ?

Note that the “Downloading and preparing dataset…” message is generic, and is shown even if there is no file to download