Hi Guys,
i was using Datasets==1.5.0 while Preprocessing my Dataset & Saving it. I updated to the latest Datasets Version (1.6.1) and since then i cant export my Datasets as Parquet File.
With Version 1.5.0 i could just do:
import pyarrow.parquet as pq
...
...
pq.write_table(train_dataset.data, 'train.parquet')
pq.write_table(eval_dataset.data, 'eval.parquet')
When i run the same code with the latest datasets version i get:
File "../preprocess_dataset.py", line 132, in <module>
pq.write_table(train_dataset.data, f'{resampled_data_dir}/{data_args.dataset_config_name}.train.parquet')
File "/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py", line 1674, in write_table
writer.write_table(table, row_group_size=row_group_size)
File "/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py", line 588, in write_table
self.writer.write_table(table, row_group_size=row_group_size)
TypeError: Argument 'table' has incorrect type (expected pyarrow.lib.Table, got ConcatenationTable)
Should i just use 1.5.0 or is the a quick and easy work around?
Im not that familar with python. in java i could just use one version for one project and another version for another project. can / should i do the same here?