Question about loading wikipedia datset

Hello, I am trying to download wikipedia dataset.
This is the code I try:

from datasets import load_dataset
dataset = load_dataset("wikipedia", "20200501.ang", beam_runner='DirectRunner')

Then it shows:
FileNotFoundError: Couldn’t find file at https://dumps.wikimedia.org/angwiki/20200501/dumpstatus.json

If I pick a recent one dump which is available from https://dumps.wikimedia.org/angwiki/ :

from datasets import load_dataset
dataset = load_dataset("wikipedia", "20200620.ang", beam_runner='DirectRunner')

It shows:
ValueError: BuilderConfig 20200620.ang not found. Available: [‘20200501.aa’, ‘20200501.ab’, ‘20200501.ace’, …]

Any advice? Thank you.

Do you need English wikipedia? If so, all you need is:
dataset = load_dataset('wikipedia', "20200501.en", split='train')

Thanks. problem solved.

1 Like