Hi, refering to the HuggingFace documentation, Iām trying to load the āopus_booksā dataset and I see from its webpage that there is an āen-frā subset. However when I run
get_dataset_config_names("opus_books")
I only get
['ca-de']
I donāt understand why āen-frā and all the other options are not showing here.
And when I tried
load_dataset("opus_books", "en-fr", split='train')
the program gives me output as below:
Downloading and preparing dataset None/ca-de to C:/Users/UserName/.cache/huggingface/datasets/parquet/ca-de-8239290e5e0370f8/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...
And later I get:
TypeError: Couldn't cast array of type
struct<ca: string, en: string>
to
struct<ca: string, de: string>
from
File "train.py", line 154, in <module>
train_model(config)
File "train.py", line 87, in train_model
train_dataloader, val_dataloader, tokenizer_src, tokenizer_tgt = get_ds(config)
File "train.py", line 44, in get_ds
ds_raw =load_dataset("opus_books", "en-fr", split='train')
File "C:\Users\UserName\miniconda3\envs\base\lib\site-packages\datasets\load.py", line 1815, in load_dataset
storage_options=storage_options,
File "C:\Users\UserName\miniconda3\envs\base\lib\site-packages\datasets\builder.py", line 913, in download_and_prepare
**download_and_prepare_kwargs,
File "C:\Users\UserName\miniconda3\envs\base\lib\site-packages\datasets\builder.py", line 1004, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\UserName\miniconda3\envs\base\lib\site-packages\datasets\builder.py", line 1768, in _prepare_split
gen_kwargs=gen_kwargs, job_id=job_id, **_prepare_split_args
File "C:\UserName\miniconda3\envs\base\lib\site-packages\datasets\builder.py", line 1912, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
I donāt quite understand where did I get wrong and would you please help me with this?