Excuse me, I am a newbie. My problem is about the finetuning of ASR models (especially facebook/s2t-small-librispeech-asr · Hugging Face) on my custom dataset consisting of my recording. Where can I find a piece of example for that?
Thanks in advance,
Hi, there is a task guide for ASR in the docs here
Thank you so much. The above guide uses a transformers dataset object, while, in my case, I have the raw wave files. How I can convert these data in order to finetune the ASR model?
You can create your own audio dataset with your files to get a
The easiest option is probably the
AudioFolder builder. You just have to create a dataset repo on the Hub and upload your audio files to it. Then you can load it like:
from datasets import load_dataset
dataset = load_dataset("audiofolder", data_dir="/path/to/data")
Check out the docs here for more details!