Misunderstanding around creating audio datasets from Local files

Hi ! Here is an example in python:

ds = Dataset.from_dict({
    "audio": ["path/to/audio_1", "path/to/audio_2", ..., "path/to/audio_n"],
    "transcription": ["First transcript", "Second transcript", ..., "Last transcript"],
}).cast_column("audio", Audio())

Alternatively you can also define an AudioFolder (see docs):

my_dataset/
β”œβ”€β”€ README.md
β”œβ”€β”€ metadata.csv
└── data/
    β”œβ”€β”€ audio_0.wav
    ...
    └── audio_n.wav

Also, if I want to have 2 separate datasets, one for test and one for training, what’s the approach to follow? Send everything and tag in the metadata.csv or create 2 folders and upload the snippets/transcription with?

You can structure your AudioFolder like this:

my_dataset/
β”œβ”€β”€ README.md
β”œβ”€β”€ metadata.csv
β”œβ”€β”€ test/
|   β”œβ”€β”€ audio_0.wav
|   ...
|   └── audio_n.wav
└── train/
    β”œβ”€β”€ audio_0.wav
    ...
    └── audio_n.wav

It’s also possible to have one metadata.csv in train/ and one in test/ if you want

2 Likes