Semantic Correspondence Dataset

JonasLoos · April 10, 2024, 9:53am

Hi! I would like to convert the semantic correspondence dataset SPair (SPair-71k: A Large-scale Benchmark for Semantic Correspondence) into a huggingface dataset. Each of the 71k rows would contain two images and metadata. However, an image can occur in more than one row, so naively saving two images per row would be a huge waste of disk space. How can I make sure the images are only saved once in the dataset but can be referenced multiple times?

Edit:
It seems like a custom dataset loading script might do the trick, which, however, requires users to run untrusted code. Is this also possible without a custom loading script, or is such a script the way to go?

JonasLoos · April 23, 2024, 8:32pm

Can confirm that this is possible with a custom dataset loading script, as I managed to create and upload the dataset: 0jl/SPair-71k · Datasets at Hugging Face

However, I’m still interested in whether or not this is possible without a custom loading script

Topic		Replies	Views
Does Hugging Face Datasets Support Efficient Referencing of Images to Avoid Duplication? 🤗Datasets	2	18	June 1, 2025
Large image dataset, feedback and advice: data viewer, task template, and more 🤗Datasets	5	917	November 22, 2022
Semantic Segmentation Dataset (one label) 🤗Datasets	1	220	December 6, 2023
A service to translate datasets into other languages 🤗Datasets	1	861	June 6, 2023
Imagenet in datasets? 🤗Datasets	2	1399	November 9, 2021

Semantic Correspondence Dataset

Related topics