this is my column schema : features = Features({'Book_ID': Value('int32'),'taraf_ID': Value('string'), 'Hadith_ID': Value('string'), 'matn': Value('string'), 'taraf_tally': Value('int32'), 'wordcount': Value('string'), 'Domain': Value('string'), 'Category': Value('string'), 'translation': Value('string')})
. When i try to to load in this dataset using this code : dataset = load_dataset("FDSRashid/hadith_info", data_files = 'All_Matns.csv', token = string1, features = features)
, i get the following error :
Failed to read file '/root/.cache/huggingface/datasets/downloads/ac7e243c60b61b8decc6fc884b4b76a7d6c12164953ec0f10a672362460a1bcd' with error <class 'ValueError'>: cannot safely convert passed user dtype of int32 for object dtyped data in column 4
ERROR:datasets.packaged_modules.csv.csv:Failed to read file '/root/.cache/huggingface/datasets/downloads/ac7e243c60b61b8decc6fc884b4b76a7d6c12164953ec0f10a672362460a1bcd' with error <class 'ValueError'>: cannot safely convert passed user dtype of int32 for object dtyped data in column 4
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
15 frames
ValueError: cannot safely convert passed user dtype of int32 for object dtyped data in column 4
The above exception was the direct cause of the following exception:
DatasetGenerationError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, job_id)
1956 if isinstance(e, SchemaInferenceError) and e.__context__ is not None:
1957 e = e.__context__
-> 1958 raise DatasetGenerationError("An error occurred while generating the dataset") from e
1959
1960 yield job_id, True, (total_num_examples, total_num_bytes, writer._features, num_shards, shard_lengths)
DatasetGenerationError: An error occurred while generating the dataset
I did successfully load everything when it was a string, apologies for the confusion. but if i have numeric data with some empty values, is the only way to load them by passing them as string?