Hello
After a couple of hours trying to get this to work I need to ask
I’m trying to expand a dataset (used with Wav2Vec2 for ASR) with following Idea:
dataset_expansion = dataset
#adding simple noise to dataset expansion
def add_simple_noise(batch):
audio = batch[‘audio_path’]
noise = np.asarray(0.01*np.random.randn(len(audio[“array”])))
audio[“array”] = audio[“array”] + noise
return batch
#map simple noise to training set
dataset_expansion = dataset_expansion.map(add_simple_noise)
dataset_expansion = dataset_expansion.cast_column(“audio_path”, datasets.Audio(sampling_rate=16_000))
The mapping itself seems to work and noise is added.
But the mapping does not seem to be correct:
By trying to concat the datasets together:
dataset = datasets.concatenate_datasets([dataset[“train”],dataset_expansion[“train”]])
Throws following error:
ArrowInvalid: Schema at index 1 was different:
audio_path: string
text: string
sampling_rate: int64
train_or_test: string
vs
audio_path: struct<array: list<item: double>, path: string, sampling_rate: int64>
text: string
sampling_rate: int64
train_or_test: string
Checking the features:
dataset[“train”].features
{‘audio_path’: Audio(sampling_rate=16000, mono=True, id=None),
‘text’: Value(dtype=‘string’, id=None),
‘sampling_rate’: Value(dtype=‘int64’, id=None),
‘train_or_test’: Value(dtype=‘string’, id=None)}
dataset_expansion[“train”].features
{‘audio_path’: Audio(sampling_rate=16000, mono=True, id=None),
‘text’: Value(dtype=‘string’, id=None),
‘sampling_rate’: Value(dtype=‘int64’, id=None),
‘train_or_test’: Value(dtype=‘string’, id=None)}
The dataset was loaded as follows:
feature_dict = {“audio_path”: datasets.Audio(sampling_rate=16_000),“text”: datasets.Value(“string”)}
data_features = datasets.Features(feature_dict)
dataset = load_dataset(“csv”,
data_files={“train”:“toy_train_data.csv”,
“test”:“toy_test_data.csv”},
)
dataset = dataset.cast_column(“audio_path”, datasets.Audio(sampling_rate=16_000,mono=True))
dataset = dataset.remove_columns(“Unnamed: 0”)
I might just be missing something really small. But I just can’t seem to find whatever needs to be done to get this to work
The alternative to doing this on the fly would be to make a copy of the data and add noise there.
If anyone could point me in the right direction, I’d really appreciate it.
Thank you and have a great day.