hey there, So ive been downloading the wikipedia corpus. while i came across this scirpt to download wikipedia dumps for a specific language at a given date.
lang_dataset = datasets.load_dataset("wikipedia", "20220301.hi", beam_runner="DirectRunner")
my doubt is, does this download all the text that’s available on wikipedia for the given language? or does it limits to downloading the text that was updated to wikipedia on that specific date ?
I actually need to download all the data the wikipedia has for the given language. how do i specifically do that ?