Image classification

jchirkes · October 1, 2023, 10:35pm

I need to train my model with tiles that come from 300 whole slide images. When the dataset is loaded using load_dataset, I loose the filename, therefore I am unable to see were the tiles come from. When I use train_test_split, it’s very important that all tiles that come from the same image are put in the same classification. How can I keep the filename so that I can make sure this works correctly?

mariosasko · October 2, 2023, 3:11pm

You’ll be able to fetch the filenames once Return the name of the currently loaded file in the load_dataset function. · Issue #5806 · huggingface/datasets · GitHub is addressed.

In the meantime, this should work:

ds = load_dataset(...)
...
ds = ds.map(lambda ex: {"filename": os.path.basename(ex["image"].filename) if ex["image"].filename else None})

Topic		Replies	Views
Confusion in splitting dataset (from imagefolder) into train, test and validation 🤗Datasets	2	5726	August 12, 2022
Load_dataset assumes 'train' Beginners	2	931	May 31, 2023
Loading an imagenet-style image dataset with train/val directories 🤗Datasets	4	1778	August 12, 2022
How can I download a specific split of a dataset? 🤗Datasets	1	1171	April 3, 2024
Undesired behavior when using load_dataset 🤗Datasets	4	945	April 17, 2023

Image classification

Related topics