Strange Error While Attempting to Load DataSet

FDSRashid · September 24, 2023, 5:40pm

this is my column schema : features = Features({'Book_ID': Value('int32'),'taraf_ID': Value('string'), 'Hadith_ID': Value('string'), 'matn': Value('string'), 'taraf_tally': Value('int32'), 'wordcount': Value('string'), 'Domain': Value('string'), 'Category': Value('string'), 'translation': Value('string')}) . When i try to to load in this dataset using this code : dataset = load_dataset("FDSRashid/hadith_info", data_files = 'All_Matns.csv', token = string1, features = features), i get the following error :

Failed to read file '/root/.cache/huggingface/datasets/downloads/ac7e243c60b61b8decc6fc884b4b76a7d6c12164953ec0f10a672362460a1bcd' with error <class 'ValueError'>: cannot safely convert passed user dtype of int32 for object dtyped data in column 4
ERROR:datasets.packaged_modules.csv.csv:Failed to read file '/root/.cache/huggingface/datasets/downloads/ac7e243c60b61b8decc6fc884b4b76a7d6c12164953ec0f10a672362460a1bcd' with error <class 'ValueError'>: cannot safely convert passed user dtype of int32 for object dtyped data in column 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
15 frames
ValueError: cannot safely convert passed user dtype of int32 for object dtyped data in column 4

The above exception was the direct cause of the following exception:

DatasetGenerationError                    Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, job_id)
   1956             if isinstance(e, SchemaInferenceError) and e.__context__ is not None:
   1957                 e = e.__context__
-> 1958             raise DatasetGenerationError("An error occurred while generating the dataset") from e
   1959 
   1960         yield job_id, True, (total_num_examples, total_num_bytes, writer._features, num_shards, shard_lengths)

DatasetGenerationError: An error occurred while generating the dataset

I did successfully load everything when it was a string, apologies for the confusion. but if i have numeric data with some empty values, is the only way to load them by passing them as string?

Topic		Replies	Views
Datasets.load_datasets fails 🤗Datasets	12	850	October 11, 2024
Cannot load dataset on Kaggle 🤗Datasets	4	3145	August 16, 2023
Unable to Load Dataset Using `load_dataset` 🤗Datasets	10	377	March 11, 2025
TypeError: Couldn't cast array of type int64 to null 🤗Datasets	3	126	February 6, 2025
DatasetGenerationError. Failed to parse string: as a scalar of type double Beginners	3	112	January 7, 2025

Strange Error While Attempting to Load DataSet

Related topics