Error while cloning repository

Hi everyone,

I’m encountering an issue while trying to build my project. It’s failing with a very minimal error message:

Build Log:
===== Build Queued at 2024-11-06 07:46:28 / Commit SHA: b439bbc =====
“Error while cloning repository”

I can’t find any more error logs.

Here’s my questions:

  1. Could this be related to the number/size of image files? I uploaded many image files to my space.
  2. If it is, how can I upload numbers of image files to my space?

Any guidance on resolving this issue or best practices for uploads would be greatly appreciated.

Here’s the link of my space.

Thanks in advance!

1 Like

To put it briefly, the problem you are encountering is probably a temporary bug in HF. The workaround is to put the data in the model repo or dataset repo, and put the programs in Spaces. If it’s just an image, there shouldn’t be a problem, but if it’s too much, it might not be the case.

Is tgz file available to my dataset? It would be much easier to upload my whole image files if it’s okay.

1 Like

Even free users can upload files of up to 50GB per file and 300GB per repo (actually, you can go even higher, but that’s the nominal limit), and you can upload as many repos as you like. Of course, you can also upload tgz files.
The problem is whether or not it’s easy to handle from the HF dataset library, etc., but please refer to the following for that.

If you don’t mind, just upload it.

Sorry if this question is off topic, but I’m having trouble loading a dataset into my project. This is the container log.

===== Application Startup at 2024-11-06 10:51:02 =====

Traceback (most recent call last):
  File "/home/user/app/app.py", line 16, in <module>
    dataset = load_dataset("imagefolder", data_dir="hangunwoo07/Naturing_Bird_Data/bird_data_naturing.tgz")
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 2132, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 1853, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 1562, in dataset_module_factory
    ).get_module()
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 940, in get_module
    else get_data_patterns(base_path, download_config=self.download_config)
  File "/usr/local/lib/python3.10/site-packages/datasets/data_files.py", line 503, in get_data_patterns
    raise EmptyDatasetError(f"The directory at {base_path} doesn't contain any data files") from None
datasets.data_files.EmptyDatasetError: The directory at /home/user/app/hangunwoo07/Naturing_Bird_Data/bird_data_naturing.tgz doesn't contain any data files
Traceback (most recent call last):
  File "/home/user/app/app.py", line 16, in <module>
    dataset = load_dataset("imagefolder", data_dir="hangunwoo07/Naturing_Bird_Data/bird_data_naturing.tgz")
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 2132, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 1853, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 1562, in dataset_module_factory
    ).get_module()
  File "/usr/local/lib/python3.10/site-packages/datasets/load.py", line 940, in get_module
    else get_data_patterns(base_path, download_config=self.download_config)
  File "/usr/local/lib/python3.10/site-packages/datasets/data_files.py", line 503, in get_data_patterns
    raise EmptyDatasetError(f"The directory at {base_path} doesn't contain any data files") from None
datasets.data_files.EmptyDatasetError: The directory at /home/user/app/hangunwoo07/Naturing_Bird_Data/bird_data_naturing.tgz doesn't contain any data files

I’m trying to load dataset like this:

dataset = load_dataset("imagefolder", data_dir="hangunwoo07/Naturing_Bird_Data/bird_data_naturing.tgz")

def get_image_from_dataset(bird_name, image_filename):
    try:
        image_path = f"naturing_bird_image/naturing_bird_image/{bird_name}/{image_filename}"
        image_data = dataset[image_path]['image']

        if isinstance(image_data, Image.Image):
            img_byte_arr = io.BytesIO()
            image_data.save(img_byte_arr, format='JPEG')
            return img_byte_arr.getvalue()
        return image_data
        
    except Exception as e:
        print(f"Error accessing image from dataset: {e}")
        print(f"Attempted path: {image_path}")
        return None

How can I use dataset in my project?

1 Like

Maybe this? I think it’s not off-topic because it’s caused by an error on the HF side and we’re doing it to avoid it.

Hi ! data_dir= is for directories, can you try using data_files= instead ?