Saving a resampled version of audio

tshmak · April 18, 2024, 4:24am

What’s the recommended way to save a resampled audio? From here, I get that huggingface recommends using cast_column to set the target sampling rate of the audio and perform resampling on-the-fly. But it seems to me that often times, we still need to save a resampled version because the model’s feature extractor expects “input_values” as input rather than “audio”:“array”.

I suppose I could do something like:

dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
dataset.map(lambda x: {'input_values': x['audio']['array']})

However, this seems to be very slow.

Below is an example script:

from datasets import load_dataset, Audio

dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))

dataset.cleanup_cache_files()
A = dataset.map(lambda x: {'resampled': x['audio']['array']})

# dataset map
dataset.cleanup_cache_files()
B = [None] * len(dataset)
from tqdm import tqdm
for i in tqdm(range(len(B))): 
    B[i] = dataset[i]['audio']['array']
dataset.add_column('resampled', B)

Below is my output:

In [17]: A = dataset.map(lambda x: {'resampled': x['audio']['array']})
Map:  99%|██████████████████████████████████████████████████████████████████████████████▏| 557/563 [00:19<00:00, 412.09 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████████████████████| 563/563 [01:06<00:00,  8.41 examples/s]

In [18]: dataset.cleanup_cache_files()
    ...: B = [None] * len(dataset)
    ...: from tqdm import tqdm
    ...: for i in tqdm(range(len(B))):
    ...:     B[i] = dataset[i]['audio']['array']
    ...: dataset.add_column('resampled', B)
100%|███████████████████████████████████████████████████████████████████████████████████████████| 563/563 [00:01<00:00, 448.90it/s]

In particular, after about 500 iterations which ran very fast, the progress bar stuck for perhaps a minute or so before finishing, so the overall progress is only about 9 examples per second. On the other hand, if I simply perform the resample as a loop, there’s no such delay (~400 examples/sec).

Topic		Replies	Views
How to load this simple audio data set and use dataset.map without memory issues? 🤗Datasets	12	4331	December 10, 2024
Expanding an Audio Dataset with datasets.map()? Beginners	4	774	December 5, 2024
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3686	February 18, 2022
Is it possible to reuse only part of an already loaded audio dataset? Beginners	0	67	June 14, 2024
Convert from HF audio dataset to raw audio file 🤗Datasets	1	859	November 22, 2023

Saving a resampled version of audio

Related topics