"Too many open files" when loading Common Voice

I’m trying to load the Common Voice dataset and I’m coming across OSError: [Errno 24] Too many open files.

There’s only one line of code: ds = datasets.load_dataset("common_voice", "en", split="train+validation", version="6.1.0", cache_dir="gcs-data/common-voice") but it might be worth mentioning that cache_dir is a mounted cloud storage path.

The error occurs when the dataset finalizes and the temporary storage folder containing the arrow tables is renamed.

I’m running Ubuntu with 32GB of RAM. ulimit -S and ulimit -H are both unlimited.

Thanks in advance!