Help in finetuning ASR models

Hi all,

Excuse me, I am a newbie. My problem is about the finetuning of ASR models (especially facebook/s2t-small-librispeech-asr · Hugging Face) on my custom dataset consisting of my recording. Where can I find a piece of example for that?

Thanks in advance,


Hi, there is a task guide for ASR in the docs here :slight_smile:

Thank you so much. The above guide uses a transformers dataset object, while, in my case, I have the raw wave files. How I can convert these data in order to finetune the ASR model?
Thank you,

You can create your own audio dataset with your files to get a Dataset object.

The easiest option is probably the AudioFolder builder. You just have to create a dataset repo on the Hub and upload your audio files to it. Then you can load it like:

from datasets import load_dataset
dataset = load_dataset("audiofolder", data_dir="/path/to/data")

Check out the docs here for more details!