Is Manual Audio Resampling Required?

itisyeetimetoday · February 7, 2022, 11:52pm

In the Speech Recognition Tutorial docs(speech_to_text_2), this line of code reads the audio files: speech, _ = sf.read(batch["file"])
However, given the _, the sample rate is discarded.
Later, the audio is prepped for loading here: inputs = processor(ds["speech"][0], sampling_rate=16_000, return_tensors="pt")
I notice the sample rate is 16k, if my files are not 16k, do I need to manually downsample my, for example, 44100hz, audio to 16k with librosa, as an example, or will the processor line of code downsample automatically for me?

Topic		Replies	Views
Failed attempt to use new Automatic Speech Recognition Beginners	2	3207	March 10, 2021
German ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	17	3681	February 18, 2022
Common Voice dataset: librosa.load() leads to LibsndfileError 🤗Datasets	0	1759	March 21, 2023
How to use MFCC feature extraction method while fine-tuning the pretrained model? Models	2	1188	May 7, 2024
Batching in "automatic-speech-recognition" pipelines 🤗Transformers	2	2271	April 19, 2024

Is Manual Audio Resampling Required?

Related topics