Hello. In the past I’ve been able to download prior Common Voice releases using common_voice
and the version number, but mozilla-foundation/common_voice_8_0
seems to give me memory issues.
The code snippet below downloads the dataset just fine but uses up all available memory in a Colab High-RAM environment when preparing the train set:
dataset = datasets.load_dataset("mozilla-foundation/common_voice_8_0",
"en",
use_auth_token="my_auth_token",
split="train")
It’s the English dataset so this will take time to download. I reduced the writer_batch_size
in the hopes that it will work but no luck
Any help would be greatly appreciated!