Hi @lhoestq , thanks for the solution. I follow that approach but getting errors to merge two datasets
dataset_ar = load_dataset('wikipedia',language='ar', date='20210320', beam_runner='DirectRunner')
dataset_bn = load_dataset('wikipedia',language='bn', date='20210320', beam_runner='DirectRunner')
I tried two ways to concatenate but both approaches give errors. Could you please help to find out what am I missing? Thanks
First Approach
dataset_cc = concatenate_datasets(dataset_ar, dataset_bn)
Traceback (most recent call last):
File “”, line 1, in
File “/home/anaconda3/envs/nlp/lib/python3.9/site-packages/datasets/arrow_dataset.py”, line 3135, in concatenate_datasets
if axis == 0 and not all([dset.features.type == dsets[0].features.type for dset in dsets]):
File “/home/anaconda3/envs/nlp/lib/python3.9/site-packages/datasets/arrow_dataset.py”, line 3135, in
if axis == 0 and not all([dset.features.type == dsets[0].features.type for dset in dsets]):
AttributeError: ‘str’ object has no attribute ‘features’
Second Approach
dataset_cc = concatenate_datasets(dataset_ar['train'], dataset_bn['train'])
Traceback (most recent call last):
File “”, line 1, in
File “/home/anaconda3/envs/nlp/lib/python3.9/site-packages/datasets/arrow_dataset.py”, line 3135, in concatenate_datasets
if axis == 0 and not all([dset.features.type == dsets[0].features.type for dset in dsets]):
File “/home/anaconda3/envs/nlp/lib/python3.9/site-packages/datasets/arrow_dataset.py”, line 3135, in
if axis == 0 and not all([dset.features.type == dsets[0].features.type for dset in dsets]):
AttributeError: ‘dict’ object has no attribute ‘features’