Whatβs the recommended way to save a resampled audio? From here, I get that huggingface recommends using cast_column
to set the target sampling rate of the audio and perform resampling on-the-fly. But it seems to me that often times, we still need to save a resampled version because the modelβs feature extractor expects βinput_valuesβ as input rather than βaudioβ:βarrayβ.
I suppose I could do something like:
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
dataset.map(lambda x: {'input_values': x['audio']['array']})
However, this seems to be very slow.
Below is an example script:
from datasets import load_dataset, Audio
dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
dataset.cleanup_cache_files()
A = dataset.map(lambda x: {'resampled': x['audio']['array']})
# dataset map
dataset.cleanup_cache_files()
B = [None] * len(dataset)
from tqdm import tqdm
for i in tqdm(range(len(B))):
B[i] = dataset[i]['audio']['array']
dataset.add_column('resampled', B)
Below is my output:
In [17]: A = dataset.map(lambda x: {'resampled': x['audio']['array']})
Map: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 557/563 [00:19<00:00, 412.09 examples/s]
Map: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 563/563 [01:06<00:00, 8.41 examples/s]
In [18]: dataset.cleanup_cache_files()
...: B = [None] * len(dataset)
...: from tqdm import tqdm
...: for i in tqdm(range(len(B))):
...: B[i] = dataset[i]['audio']['array']
...: dataset.add_column('resampled', B)
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 563/563 [00:01<00:00, 448.90it/s]
In particular, after about 500 iterations which ran very fast, the progress bar stuck for perhaps a minute or so before finishing, so the overall progress is only about 9 examples per second. On the other hand, if I simply perform the resample as a loop, thereβs no such delay (~400 examples/sec).