Hello. In the past I’ve been able to download prior Common Voice releases using
common_voice and the version number, but
mozilla-foundation/common_voice_8_0 seems to give me memory issues.
The code snippet below downloads the dataset just fine but uses up all available memory in a Colab High-RAM environment when preparing the train set:
dataset = datasets.load_dataset("mozilla-foundation/common_voice_8_0", "en", use_auth_token="my_auth_token", split="train")
It’s the English dataset so this will take time to download. I reduced the
writer_batch_size in the hopes that it will work but no luck
Any help would be greatly appreciated!