How to get file paths when iterating over a custom dataset with KeyDataset?

jblazek · October 6, 2023, 1:22pm

I am not sure if this is nice solution but it works…

First of all create metadata.csv in your data_dir where you put the file_name and some id.
Load the dataset as before… Now the

dataset_mp3["train"][1] == {
   "audio": "path/to/audio/file",
   "id": "id from the metadata.csv"
}

for item in tqdm(KeyPairDataset(dataset_mp3["train"], "audio", "id")):
    out = pipe(item["text"])
    print(out, item["text_pair"])

It can be simplifed if you need just file_name just iterate over KeyDataset and run pipe inside the loop.

Hope this helps.

Topic		Replies	Views
Error Iterating over KeyDataset 🤗Datasets	0	33	August 30, 2024
KeyError: 'csv' using a csv file with KeyDataset Beginners	6	686	September 20, 2023
Create datasets object from multiple remote audio paths residing in Google Cloud Storage 🤗Datasets	2	376	June 28, 2022
ValueError: audio at <filename> doesn't have metadata in <path>/metadata.csv 🤗Datasets	6	999	October 30, 2023
Error "TypeError: not a path-like object" when iterating through a streamed dataset 🤗Datasets	3	542	September 8, 2022