Hide download count on dataset page

Good morning,

Is there a way to hide the download count on the dataset page? In our case (MahmoodLab/hest 路 Datasets at Hugging Face) it鈥檚 easier to use snapshot_download instead of load_dataset because of the format of Spatial Transcriptomics data (.h5, .tiff), therefore the download count isn鈥檛 incrementing.

Thank you

Actually, the main reason why we are not using load_dataset is because files are being renamed to some hash in the cache. Is there a way to create a custom dataset loading script (datasets.GeneratorBasedBuilder) such that files are not being renamed?

Using snapshot_download inside _split_generators seems to be the solution:

import datasets
from datasets import Features, Value
from huggingface_hub import snapshot_download


class HestDataset(datasets.GeneratorBasedBuilder):
    def _info(self):
        return datasets.DatasetInfo(
            description="HEST: A Dataset for Spatial Transcriptomics and Histology Image Analysis",
            homepage="https://github.com/mahmoodlab/hest",
            license="CC BY-NC-SA 4.0 Deed",
            features=Features({
                'path': Value('string')
            })
        )

    def _split_generators(self, dl_manager):
        # Download files using the huggingface_hub API
        filenames = [f.split('hest@main/')[-1] for f in self.config_kwargs['data_files']['train']]
        extracted_files = {}
        snapshot_download(repo_id=self.repo_id, allow_patterns=filenames, repo_type="dataset", local_dir=self._cache_dir_root)
        extracted_files['data'] = filenames
        return [
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={"filepath": extracted_files["data"]},
        )]

    def _generate_examples(self, filepath):
        idx = 0
        for file in filepath:
            yield idx, {
                'path': file
            }
            idx += 1

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.