Questions about Dataset.map()

I encountered a problem when I used ‘map()’ for data preprocessing, my preprocessing function returned a 4d numpy.ndarray, but when I used the ‘map()’ function, I printed ‘dataset.features’, and found that the features in the column ‘image’ were not a 4d numpy.ndarray but a nested Sequence, and if I promised that a certain sample’s shape, there will be an error: List has no attribute shape
Here is the snap of my code and output of dataset.features

dataset.features

{‘flair_path’: Value(dtype=‘string’, id=None), ‘t1ce_path’: Value(dtype=‘string’, id=None), ‘seg_path’: Value(dtype=‘string’, id=None), ‘label’: Value(dtype=‘int64’, id=None), ‘image’: Sequence(feature=Sequence(feature=Sequence(feature=Sequence(feature=Value(dtype=‘float32’, id=None), length=-1, id=None), length=-1, id=None), length=-1, id=None), length=-1, id=None)}

Maybe you can use the Array4D feature type instead ?

1 Like

I followed your advice, I artificially defined the type of ‘image’ to be Array4D, and when I print the features of dataset, the features of image is indeed Array4D, but the problem is that when I try to print the dataset[0][‘image’].shape, it still throws the same error,: List has no attribute ‘shape’

Can you share the stack trace ?

Sure.

Exception has occurred: AttributeError
‘list’ object has no attribute ‘shape’
File “/root/autodl-tmp/dataset/dataset.py”, line 90, in
print(dataset[0][‘image’].shape)
AttributeError: ‘list’ object has no attribute ‘shape’

Another interesting point is that when I add the following lines of code:

The output looks like this:

<class 'list'>
Image saved to /root/autodl-tmp/dataset/first_image.npy
Loaded image shape: (16, 160, 160, 2)

I see ! You’re getting pythons lists which is the default behavior, while you want to get numpy arrays

To get numpy arrays you can change the dataset format to “numpy”:

dataset = dataset.with_format("numpy")

now all the examples will be returned as numpy arrays :slight_smile:

That works. Thank you so much for your patience!