Error occurs when loading dataset with load_dataset()

Hi all! I’m a beginner of hf and ai. When I tried to load dataset downloaded from hf, I met an error. Here is my code below.

wiki_set = load_dataset("wikipedia", "20231101.en", split='train')

The error is ‘raise DatasetGenerationError(“An error occurred while generating the dataset”) from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset’.
I searched online and found that load_from_disk() may work for me. However, the official document said load_dataset() can be used to load dataset locally, I’d like to know how to modify my code can solve the error.


Hi there @eelearner ,
Can you provide a link to the Wikipedia dataset that you’re referring to?
The one I found: wikipedia · Datasets at Hugging Face
Seems to provide “ 20220301.en” subset only.

wiki_set = load_dataset(“wikipedia”, “20220301.en”, split=‘train’)

Should work that case.
Another case though would be if you have scraped Wikipedia more recently and want to load your scrape into hf dataset? For that you’d need to share more information.

Hi! We plan to deprecate this dataset in the next few days. You should use

ds = load_dataset("wikimedia/wikipedia", "20231101.en", split="train")


Hi, this is the Wikipedia dataset I’m referring to: wikimedia/wikipedia
Thx for help. I’ve found the problem that I downloaded the dataset using git but checkout failed. It means that I didn’t download the whole dataset properly. I solved this problem few minutes ago!

Yeah. Thanks for reminding!