Loading custom audio dataset and fine-tuning model

Chace · July 30, 2021, 11:33am

Hi all. I’m very new to HuggingFace and I have a question that I hope someone can help with.

I was suggested the XLSR-53 (Wav2Vec) model for my use-case which is a speech to text model. However, the languages I require aren’t supported so I was told I need to fine-tune the model per my requirements. I’ve seen several documentation but they all use Common Voice which also doesn’t support what I need.

I have ~4 hours audio files and tsv files (annotations of the audio) but I am not sure how to load them and fine-tune the model with them. I can’t find much info online either. Is there any reference I can follow?

Any help would be appreciated.

nikhil6041 · November 4, 2021, 1:12pm

@patrickvonplaten I am also trying it out for a similar usecase but couldnt find any example script till now for audio datasets other than CommonVoice. I have several datasets with me which arent available on huggingface datasets but because almost all the scripts rely so much on the usage of huggingface datasets its hard to get my head around it to change it my use cases. If you can suggest me any resources or any changes so that I can use my own dataset inspite of Commonvoice or any other dataset available on huggingface datasets it would be of great help.

weirdguitarist · July 13, 2022, 11:40am

Hi. I’m trying to do the same thing. I loaded my data in a DataFrame containing “file” and “text” similarly to the available datasets like CommonVoice but I’m not sure what to do with the audio so that it can be processed with the Audio feature of Huggingface. Did you find a solution ?

mariosasko · July 13, 2022, 1:54pm

Hi @weirdguitarist! You can do the following to adjust the dataset format:

from datasets import Dataset, Audio, Value, Features

dset = Dataset.from_pandas(df)
features = Features({"text": Value("string"), "file": Audio(sampling_rate=...)})
dset = dset.cast(features)

Kuldeep7688 · September 23, 2022, 12:05am

Hi, I kinda figured out how to load a custom dataset having different splits (train, test, valid)

Step 1 : create csv files for your dataset (separate for train, test and valid) . The columns will be “text”, “path” and “audio”, Keep the transcript in the text column and the audio file path in “path” and “audio” column.(keep same in both)

Step 2: save the csv files with appropriate names like train_data.csv, test_data.csv and valid_data.csv

Step 3: Define features like below :

features = Features(
    {
        "text": Value("string"), 
        'path': Value('string'),
        "audio": Audio(sampling_rate=16000)
    }
)

Step 4 : load the dataset using below piece of code :

sample_data = load_dataset(
    'csv', data_files={
        'train': 'train_data.csv', 
        'test': 'test_data.csv',
        'valid': 'valid_data.csv'
    }
)

you will get something like this when you will print sample_data:

DatasetDict({
    train: Dataset({
        features: ['text', 'path', 'audio'],
        num_rows: 10
    })
    test: Dataset({
        features: ['text', 'path', 'audio'],
        num_rows: 10
    })
    valid: Dataset({
        features: ['text', 'path', 'audio'],
        num_rows: 10
    })
})

Step 5: cast your features into specified formats in the features using cast :

sample_data = sample_data.cast(features)

And you are done. The cast will automatically load the audio files from the mentioned paths and convert into numpy arrays with given sampling rate.

mariosasko · September 26, 2022, 7:22pm

You can also pass the features directly to load_dataset now to perform the cast, which avoids an extra transformation (leading to less space used for caching).

Sarah · December 12, 2023, 1:14am

My local training dataset contains long audios(1-2hr) with timestamp of each sentence. What’s the proper approach to load them?

Topic		Replies	Views
[SOLVED] How to import a custom dataset (wav2vec2 & Common Voice)? Beginners	5	2066	August 4, 2023
How to use load_dataset to load my own local dataset? 🤗Datasets	1	909	May 24, 2023
Help in finetuning ASR models Beginners	3	538	January 13, 2023
Create the Moxilla Common Voice Data 🤗Datasets	2	816	November 15, 2022
How to import a custom dataset to fine tune wav2vec Beginners	0	916	October 19, 2022

Loading custom audio dataset and fine-tuning model

Related topics