Hi, a couple of questions:
1- I have a folder for training consisting of thousands of mp3 files, and a mapping.csv that has the path + the transcription I also have another file called test with thousands of files and a mapping csv that consists of the path + the transcription.
I’m creating a dataset from local files but I want to specify that the train data is for training and test data are for tests when I’m fine-tuning using the newly created dataset.
In the documentation it’s not clear how they’re separated or how do I label a folder as train and the other as test.
audio_dataset = Dataset.from_dict({"audio": ["path/to/audio_1", "path/to/audio_2", ..., "path/to/audio_n"]}).cast_column("audio", Audio())
audio_dataset[0]["audio"]
{'array': array([ 0. , 0.00024414, -0.00024414, ..., -0.00024414,
0. , 0. ], dtype=float32),
'path': 'path/to/audio_1',
'sampling_rate': 16000}
Please help here