Map function has different output shape

Leon299 · May 10, 2024, 12:51am

Hello all, I was trying to use MNIST datasets and flatten the image, but I found it gives me extra dimension when I don’t add a new key. It is not a big deal but I am curious about why this happens. Really appreciate!

ds = load_dataset("mnist",split = 'train[:10]').with_format('torch')

def transforms(examples):
    examples['new_image'] = [image.reshape(-1) for image in examples['image']]
    return examples

new_ds = ds.map(transforms)
new_ds['new_image'].shape #torch.Size([10, 1, 784]) which is correct

def transforms_2(examples):
    examples['image'] = [image.reshape(-1) for image in examples['image']]
    return examples

new_ds2 = ds.map(transforms_2)
new_ds2['image'].shape #torch.Size([10, 1, 1, 784]) which has an extra dimension

Topic		Replies	Views
Odd dataset.map() behavior with PyTorch dataloader 🤗Datasets	2	226	March 25, 2024
Hugging face datasets -- reading image shape takes very long time Beginners	1	281	April 4, 2023
Custom dataset output dimensions Beginners	0	597	May 15, 2022
Dataset map return only list instead torch tensors Beginners	8	5658	March 17, 2025
Questions about Dataset.map() 🤗Datasets	6	86	August 20, 2024

Map function has different output shape

Related topics