UniDecodeError: 'charmap' codec can't decode byte from Load_dataset

Hello,
I have tried to run the load_datasets locally by using.

data = load_dataset(
    "dennlinger/eur-lex-sum",
    "english",
    trust_remote_code=True,
    download_mode="force_redownload",  # Ensure fresh download
    # encoding="utf-16",
)

However I receive the error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 3598:

When I searched online, I figured out it is something to do with the encoding to utf-8. But I am observing they removed the encoding parameter from load_dataset.

1 Like