Missing ['array] when using map() method

Hello every one. When I use map method to modify the data. Return a KeyError: ‘array’.
But I’m sure my dataset include this column.


And the output is:

    features: ['audio', 'gender'],
    num_rows: 16960
tensor([-8.2690e-14, -7.3000e-13,  1.5195e-13,  ...,  8.2001e-07,
         9.9102e-07, -3.9292e-07], device='cuda:0')

My code is:

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
def preprocess_function(examples):
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays, sampling_rate=feature_extractor.sampling_rate, max_length=16000,
        truncation=True, padding=True, 
    return inputs
ds_train = ds_train.map(preprocess_function, remove_columns="audio",  batched=True)

I’m not sure what happend. Thank you in advance.

I’m seeing this behavior too. array is there when manually indexing into the dataset, but only {'bytes': None, 'path': './data/xyz.mp3'} inside of the map() callback function.

It only seems to happen if I have dataset.set_format(type='torch') - if I don’t call that before map(), then it works as expected.

1 Like

That works! Thank you!