DatasetGenerationError. Failed to parse string: as a scalar of type double

John6666 · January 7, 2025, 3:41pm

why it decided to encode it as float instead

I think that probably made that’s decision based on the first few lines…
It seems that external libraries (Pandas and PyArrow) are used for parsing CSV and JSON, and that’s probably how it works. It seems that things like on_bad_lines=“skip” are also completely thrown over to them.

Topic		Replies	Views
Pyarrow failed to parse string 🤗Datasets	5	7305	August 19, 2023
Strange Error While Attempting to Load DataSet 🤗Datasets	7	3632	March 28, 2025
ArrowTypeError in load_dataset 🤗Datasets	1	629	June 12, 2023
pyarrow.lib.FloatArray: did not recognize Python value type when inferring an Arrow data type 🤗Datasets	3	4987	March 17, 2023
Cannot load dataset on Kaggle 🤗Datasets	4	3145	August 16, 2023

DatasetGenerationError. Failed to parse string: as a scalar of type double

Related topics