Cannot load Audio dataset

pain · April 19, 2023, 8:58am

Hi,

I am trying to create an load code for this dataset [pain/MASC · Datasets at Hugging Face].

Simply, I want to load the dataset as "train, validation, test, clean_dev, etc …].

There is a problem on the column names and I have tried a lot to solve it with no luck ):

cd /home/lenovo/Desktop/MASC/masc ; /usr/bin/env /bin/python3 /home/lenovo/.vscode/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 60263 -- /home/lenovo/Desktop/MASC/masc/test.py 
Using custom data configuration MASC-f6ca6fe48e19e15b
Downloading and preparing dataset csv/MASC to /home/lenovo/Desktop/MASC/masc/MASC/csv/MASC-f6ca6fe48e19e15b/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a...
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4059.00it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1164.98it/s]
0 tables [00:00, ? tables/s]schema
video_id: string
category: string
video_duration: int64
channel_id: string
country: string
dialect: string
gender: string
transcript_duration: double
-- schema metadata --
huggingface: '{"info": {"features": {"video_id": {"dtype": "string", "id"' + 484
table.column_names
['video_id', 'start', 'end', 'duration', 'text']
features
['category', 'channel_id', 'country', 'dialect', 'gender', 'transcript_duration', 'video_duration', 'video_id']
Traceback (most recent call last):
  File "/home/lenovo/Desktop/MASC/masc/test.py", line 3, in <module>
    masc = load_dataset("/home/lenovo/Desktop/MASC/masc/MASC", cache_dir="/home/lenovo/Desktop/MASC/masc/MASC", data_dir="data")
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/load.py", line 1746, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/builder.py", line 704, in download_and_prepare
    self._download_and_prepare(
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/builder.py", line 793, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/builder.py", line 1277, in _prepare_split
    writer.write_table(table)
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/arrow_writer.py", line 524, in write_table
    pa_table = table_cast(pa_table, self._schema)
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/table.py", line 2011, in table_cast
    return cast_table_to_schema(table, schema)
  File "/home/lenovo/.local/lib/python3.8/site-packages/datasets/table.py", line 1974, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
video_id: string
start: double
end: double
duration: double
text: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 820
to
{'video_id': Value(dtype='string', id=None), 'category': Value(dtype='string', id=None), 'video_duration': Value(dtype='int64', id=None), 'channel_id': Value(dtype='string', id=None), 'country': Value(dtype='string', id=None), 'dialect': Value(dtype='string', id=None), 'gender': Value(dtype='string', id=None), 'transcript_duration': Value(dtype='float64', id=None)}
because column names don't match

fkov · June 11, 2023, 8:21am

have similar issue

Topic		Replies	Views
Passing schema features to a load_dataset function 🤗Datasets	4	1463	August 26, 2021
Correct way to create a Dataset from a csv file Beginners	13	14241	March 25, 2022
Load dataset who has been automatically processed by AutoNLP 🤗Datasets	1	903	March 2, 2022
I uploaded a dataset through huggface web interface. But i can't load it! 🤗Datasets	3	1010	May 14, 2023
HF Datasets loading csv Beginners	1	1105	January 30, 2021

Cannot load Audio dataset

Related topics