Expanding an Audio Dataset with datasets.map()?

Hello
After a couple of hours trying to get this to work I need to ask :slight_smile:

I’m trying to expand a dataset (used with Wav2Vec2 for ASR) with following Idea:

dataset_expansion = dataset

#adding simple noise to dataset expansion
def add_simple_noise(batch):
audio = batch[‘audio_path’]
noise = np.asarray(0.01*np.random.randn(len(audio[“array”])))
audio[“array”] = audio[“array”] + noise
return batch

#map simple noise to training set
dataset_expansion = dataset_expansion.map(add_simple_noise)
dataset_expansion = dataset_expansion.cast_column(“audio_path”, datasets.Audio(sampling_rate=16_000))

The mapping itself seems to work and noise is added.

But the mapping does not seem to be correct:

By trying to concat the datasets together:
dataset = datasets.concatenate_datasets([dataset[“train”],dataset_expansion[“train”]])

Throws following error:
ArrowInvalid: Schema at index 1 was different:
audio_path: string
text: string
sampling_rate: int64
train_or_test: string
vs
audio_path: struct<array: list<item: double>, path: string, sampling_rate: int64>
text: string
sampling_rate: int64
train_or_test: string

Checking the features:
dataset[“train”].features
{‘audio_path’: Audio(sampling_rate=16000, mono=True, id=None),
‘text’: Value(dtype=‘string’, id=None),
‘sampling_rate’: Value(dtype=‘int64’, id=None),
‘train_or_test’: Value(dtype=‘string’, id=None)}

dataset_expansion[“train”].features
{‘audio_path’: Audio(sampling_rate=16000, mono=True, id=None),
‘text’: Value(dtype=‘string’, id=None),
‘sampling_rate’: Value(dtype=‘int64’, id=None),
‘train_or_test’: Value(dtype=‘string’, id=None)}

The dataset was loaded as follows:
feature_dict = {“audio_path”: datasets.Audio(sampling_rate=16_000),“text”: datasets.Value(“string”)}
data_features = datasets.Features(feature_dict)

dataset = load_dataset(“csv”,
data_files={“train”:“toy_train_data.csv”,
“test”:“toy_test_data.csv”},
)
dataset = dataset.cast_column(“audio_path”, datasets.Audio(sampling_rate=16_000,mono=True))
dataset = dataset.remove_columns(“Unnamed: 0”)

I might just be missing something really small. But I just can’t seem to find whatever needs to be done to get this to work :slight_smile:

The alternative to doing this on the fly would be to make a copy of the data and add noise there.

If anyone could point me in the right direction, I’d really appreciate it.

Thank you and have a great day.

1 Like

Additional information:

dataset[“train”][0][“audio_path”]
{‘path’: ‘./audio/ch_ag_0006.wav’,
‘array’: array([-3.0517578e-05, -3.0517578e-05, -3.0517578e-05, …,
-1.2207031e-04, -9.1552734e-05, 0.0000000e+00], dtype=float32),
‘sampling_rate’: 16000}

dataset_expansion[“train”][0][“audio_path”]

AttributeError: ‘dict’ object has no attribute ‘endswith’

Hi! Which version of datasets are you using? I’m pretty sure this issue can be resolved by using the newest version of datasets, which you can install as follows:

pip install -U datasets

Let me know if that doesn’t help.

Hello Mario
Thank you for your time.
That was one of my assumptions at first, as the original version ran on 1.17. (and didn’t work as expected). This version of the script is running on datasets 2.0.0 (and transformers 4.17.0).

In the meantime I’m doing the audio expansion locally and uploading the expanded audio to run my tests. Not quite as elegant as I would like, but it works :slight_smile:

Cheers

Stefan