Datasets.load_datasets fails

John6666 · October 4, 2024, 11:47pm

Character code errors still occur in 2024…
Apparently there are cases where it can be avoided by explicitly specifying it at load time.
If this does not work, there may be another cause.

dataset = datasets.load_dataset(
“jxu124/OpenX-Embodiment”,
“berkeley_gnm_cory_hall”,
streaming=False,
split=“train”,
cache_dir=ds_root,
trust_remote_code=True,
encoding="utf-16",
)

or

dataset = datasets.load_dataset(
“jxu124/OpenX-Embodiment”,
“berkeley_gnm_cory_hall”,
streaming=False,
split=“train”,
cache_dir=ds_root,
trust_remote_code=True,
encoding="utf-8",
)

Topic		Replies	Views
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Beginners	3	11943	August 23, 2023
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte (dataset) Beginners	0	369	May 19, 2024
Random utf-8 errors from dataset Intermediate	10	3427	December 8, 2023
UniDecodeError: 'charmap' codec can't decode byte from Load_dataset Beginners	0	56	December 5, 2024
Problem reading my own dataset 🤗Datasets	0	208	May 26, 2024

Datasets.load_datasets fails

Related topics