Apologies for the spam.
I am currently trying to process an image dataset into a new representation,
Essentially I am converting an image into a set of sub-images, I would like each sub image to be a single example in the dataset.
Currently I have the dataset of the form
Dataset({
features: [ācoordinatesā, āfilenameā, āimgā, ālabelā, āfull_labelā],
num_rows: 11969
})
and I have a map function which converts it to:
Dataset({
features: [ācoordinatesā, āfilenameā, āsub_imagesā, ālabelā, āfull_labelā],
num_rows: 11969
})
where āsub_imagesā is a list containing n sub-images
I would like to convert this new dataset to the form:
Dataset({
features: [ācoordinatesā, āfilenameā, āimgā, ālabelā, āfull_labelā],
num_rows: 11969*n
})
Where each sub-images field āunrollsā into n separate rows, duplicating the corresponding coordinates, filenames, labels and full_labels. I have attempted this with batched mapping with the following function.
def patches_to_examples(example):
return{"label": [example["label"] for _ in example["sub_images"]],
"full_label": [example["full_label"] for _ in example["sub_images"]],
"filename": [example["filename"] for _ in example["sub_images"]],
"coordinates": [example["coordinates"] for _ in example["sub_images"]],
"img":[np.array(image) for image in example["sub_images"]]}
ds = ds.map(patches_to_examples, batched = True, remove_columns = ds.column_names)
however this only creates 1 row per example and stacks the images in a list. where I would like it to create len(sub_images)
rows per example, with one image per row.
Any suggestions on where Iām going wrong?
Cheers in advance!