Hi
since my ubuntu machine can’t connect to huggingface, i have to load the dataset (Fill50K) locally. I first run the git clone https://huggingface.co/datasets/fusing/fill50k
on my PC, then i upload it to my ubuntu machine and unzip the images. It has the following directory structure:
- fill50k
- fill50k.py
- images
- conditioning_images
- train.jsonl
- images.zip
- conditioning_images.zip
I modify the _split_generators in fill50k.py
# METADATA_URL = hf_hub_url(
# "fusing/fill50k",
# filename="train.jsonl",
# repo_type="dataset",
# )
# IMAGES_URL = hf_hub_url(
# "fusing/fill50k",
# filename="images.zip",
# repo_type="dataset",
# )
# CONDITIONING_IMAGES_URL = hf_hub_url(
# "fusing/fill50k",
# filename="conditioning_images.zip",
# repo_type="dataset",
# )
def _split_generators(self, dl_manager):
# metadata_path = dl_manager.download(METADATA_URL)
# images_dir = dl_manager.download_and_extract(IMAGES_URL)
# conditioning_images_dir = dl_manager.download_and_extract(
# CONDITIONING_IMAGES_URL
# )
metadata_path = './train.jsonl'
images_dir = './images'
conditioning_images_dir = './conditioning_images'
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"metadata_path": metadata_path,
"images_dir": images_dir,
"conditioning_images_dir": conditioning_images_dir,
},
),
]
but when i run dataset = load_dataset('./fill50k')
, it still download the files.
from datasets import load_dataset
dataset = load_dataset('./fill50k')
---------------------
No config specified, defaulting to: fill50k/default
Downloading and preparing dataset fill50k/default to /home/xxx/.cache/huggingface/datasets/fill50k/default
I want to know how to solve this, thanks.