How to load local dataset

unrealMJ · April 22, 2023, 3:02am

Hi

since my ubuntu machine can’t connect to huggingface, i have to load the dataset (Fill50K) locally. I first run the git clone https://huggingface.co/datasets/fusing/fill50k on my PC, then i upload it to my ubuntu machine and unzip the images. It has the following directory structure:

fill50k
- fill50k.py
- images
- conditioning_images
- train.jsonl
- images.zip
- conditioning_images.zip

I modify the _split_generators in fill50k.py

# METADATA_URL = hf_hub_url(
#     "fusing/fill50k",
#     filename="train.jsonl",
#     repo_type="dataset",
# )

# IMAGES_URL = hf_hub_url(
#     "fusing/fill50k",
#     filename="images.zip",
#     repo_type="dataset",
# )

# CONDITIONING_IMAGES_URL = hf_hub_url(
#     "fusing/fill50k",
#     filename="conditioning_images.zip",
#     repo_type="dataset",
# )


def _split_generators(self, dl_manager):
        # metadata_path = dl_manager.download(METADATA_URL)
        # images_dir = dl_manager.download_and_extract(IMAGES_URL)
        # conditioning_images_dir = dl_manager.download_and_extract(
        #     CONDITIONING_IMAGES_URL
        # )

        metadata_path = './train.jsonl'
        images_dir = './images'
        conditioning_images_dir = './conditioning_images'

        return [
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                # These kwargs will be passed to _generate_examples
                gen_kwargs={
                    "metadata_path": metadata_path,
                    "images_dir": images_dir,
                    "conditioning_images_dir": conditioning_images_dir,
                },
            ),
        ]

but when i run dataset = load_dataset('./fill50k'), it still download the files.

from datasets import load_dataset
dataset = load_dataset('./fill50k')

---------------------
No config specified, defaulting to: fill50k/default
Downloading and preparing dataset fill50k/default to /home/xxx/.cache/huggingface/datasets/fill50k/default

I want to know how to solve this, thanks.

lhoestq · May 2, 2023, 12:42pm

What makes you think it downloads files ?

Note that the “Downloading and preparing dataset…” message is generic, and is shown even if there is no file to download

Topic		Replies	Views
Huggingface-cli to load_dataset 🤗Datasets	5	3844	July 23, 2025
Downloading a dataset files locally Beginners	3	37055	November 4, 2024
Using load_datasets for newly created datasets 🤗Datasets	2	456	August 27, 2021
Accessing local data files 🤗Datasets	1	534	September 23, 2022
Loading downloaded dataset from local directory 🤗Datasets	0	239	April 20, 2024

How to load local dataset

Related topics