I am using Kaggle notebook kernels. I have a folder โtraindataโ in my working directory that contains audio files and a metadata.csv file (you can see teh audio files, and the .csv file in the outputlist below). The .csv file name contains a column โfile_nameโ containing the file names of all audiofiles. In my first notebook, when I try to load this data using load_dataset function (Load audio data), there is absolutely no problem. It loads perfectly as expected. However, when I run exactly the same code, in a 2nd Kaggle notebook, I get Filenotfound error. Full details are as follows. Can someone @sanchit-gandhi help me with this, as I intend to use the 2nd notebook for a different prediction model/ approach using Whisper (I tried Wav2Vec2 in my 1st notebook and am not getting desired result)?
os.listdir("/kaggle/working/traindata")
['17d469e3c0f8.mp3',
'7f64f5ad7c72.mp3',
'792590a0c97b.mp3',
'40a932482ffa.mp3',
'20bff6808089.mp3',
'072c952790a8.mp3',
'b960faf6e6c9.mp3',
'7144e5a3951c.mp3',
'metadata.csv',
'e101861d7fc6.mp3',
'dec206df575b.mp3',
'fd9da8a487a6.mp3',
'd4bf563e8d74.mp3',
'a1db8eecfa15.mp3',
'c4de30d87c19.mp3',
'35ab3905df36.mp3',
'd0dcd7a9aa9d.mp3',
'd510c4a0f4c3.mp3',
'67a9e9be989d.mp3',
'665f9d30c16c.mp3',
'6cc0c4fcd376.mp3',
'bc11e6300bab.mp3',
'f4f100cc5126.mp3',
'82b81f884c22.mp3',
'e3e147532ab4.mp3',
'c4eb00849950.mp3',
'86c743dbdffc.mp3',
'1dc891cad82d.mp3',
'05bb928d483e.mp3',
'dcca4da43c55.mp3',
'2ff894872320.mp3',
'3df8624d57a5.mp3',
'bdce6383b6a3.mp3',
'9e156c339843.mp3',
'45c2362f6c24.mp3',
'802b1445767e.mp3',
'c6d7f0e0d016.mp3',
'151abc026c93.mp3',
'a1317f179adb.mp3',
'83e1b5ce808a.mp3',
'278ed2187132.mp3',
'90b8207f80a3.mp3',
'7989b39c6806.mp3',
'77738ec9edbc.mp3',
'3ea951d7af47.mp3',
'e3665ff03a0c.mp3',
'6c4e4d9823ac.mp3',
'27846b8f8edd.mp3',
'8df70760f935.mp3',
'd6f06a5c0e02.mp3',
'e2ab17915a45.mp3',
'16841afc8002.mp3',
'284eeb420025.mp3',
'99e567500082.mp3',
'aa101d9351a2.mp3',
'2d97709e1321.mp3',
'1bcdd2ab7204.mp3',
'f01dd698a636.mp3',
'52cb9dc45a60.mp3',
'56497258f4d4.mp3',
'b81628311b82.mp3',
'af9cfe48184c.mp3',
'87961540a611.mp3',
'aad63a719baf.mp3',
'67ff0d4f0abe.mp3',
'ba00881866dd.mp3',
'b3793565c709.mp3',
'7ceca0306fa1.mp3',
'9db958779825.mp3',
'24bc00853dfd.mp3',
'82facfcaf4af.mp3',
'9e849a13f4d2.mp3',
'5b7577a65f36.mp3',
'0e0cd7ae0a4b.mp3',
'948012d60dbc.mp3',
'dce2b585586b.mp3',
'cc29d450c5fc.mp3',
'178c4fbf1765.mp3',
'ab2905e4bc54.mp3',
'ef8953b95e6a.mp3',
'c6eda3ea8c01.mp3',
'244d9567f4ba.mp3',
'078b6f9629b2.mp3',
'75fb7aa16b1c.mp3',
'c7d2473e379f.mp3',
'd29197658275.mp3',
'9680b1e57366.mp3',
'403f13f1b957.mp3',
'85391aa9a25f.mp3',
'36229829da97.mp3',
'ec9c15cb1e78.mp3',
'84eb69de8f29.mp3',
'973f2efec47a.mp3',
'd4541f36bb70.mp3',
'15b565b3a352.mp3',
'9574a61cc33c.mp3',
'a8d6bf1285d5.mp3',
'62e2a304f67d.mp3',
'e7b575cc3a88.mp3',
'69a85e880d81.mp3',
'3beed37e15c5.mp3']
train_dataset = load_dataset("audiofolder", data_dir="/kaggle/working/traindata")
train_dataset
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ in <module>:1 โ
โ โ
โ โฑ 1 train_dataset = load_dataset("audiofolder", data_dir="/kaggle/working/traindata", drop_l โ
โ 2 โ
โ 3 train_dataset โ
โ 4 โ
โ โ
โ /opt/conda/lib/python3.10/site-packages/datasets/load.py:1664 in load_dataset โ
โ โ
โ 1661 โ ignore_verifications = ignore_verifications or save_infos โ
โ 1662 โ โ
โ 1663 โ # Create a dataset builder โ
โ โฑ 1664 โ builder_instance = load_dataset_builder( โ
โ 1665 โ โ path=path, โ
โ 1666 โ โ name=name, โ
โ 1667 โ โ data_dir=data_dir, โ
โ โ
โ /opt/conda/lib/python3.10/site-packages/datasets/load.py:1490 in load_dataset_builder โ
โ โ
โ 1487 โ if use_auth_token is not None: โ
โ 1488 โ โ download_config = download_config.copy() if download_config else DownloadConfig( โ
โ 1489 โ โ download_config.use_auth_token = use_auth_token โ
โ โฑ 1490 โ dataset_module = dataset_module_factory( โ
โ 1491 โ โ path, โ
โ 1492 โ โ revision=revision, โ
โ 1493 โ โ download_config=download_config, โ
โ โ
โ /opt/conda/lib/python3.10/site-packages/datasets/load.py:1238 in dataset_module_factory โ
โ โ
โ 1235 โ โ โ โ if isinstance(e1, OfflineModeIsEnabled): โ
โ 1236 โ โ โ โ โ raise ConnectionError(f"Couln't reach the Hugging Face Hub for datas โ
โ 1237 โ โ โ โ if isinstance(e1, FileNotFoundError): โ
โ โฑ 1238 โ โ โ โ โ raise FileNotFoundError( โ
โ 1239 โ โ โ โ โ โ f"Couldn't find a dataset script at {relative_to_absolute_path(c โ
โ 1240 โ โ โ โ โ โ f"Couldn't find '{path}' on the Hugging Face Hub either: {type(e โ
โ 1241 โ โ โ โ โ ) from None โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
FileNotFoundError: Couldn't find a dataset script at /kaggle/working/audiofolder/audiofolder.py or any data file in
the same directory. Couldn't find 'audiofolder' on the Hugging Face Hub either: FileNotFoundError: Couldn't find
file at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/audiofolder/audiofolder.py