"Too many open files" when loading Common Voice

Ollie · January 27, 2022, 10:37pm

I’m trying to load the Common Voice dataset and I’m coming across OSError: [Errno 24] Too many open files.

There’s only one line of code: ds = datasets.load_dataset("common_voice", "en", split="train+validation", version="6.1.0", cache_dir="gcs-data/common-voice") but it might be worth mentioning that cache_dir is a mounted cloud storage path.

The error occurs when the dataset finalizes and the temporary storage folder containing the arrow tables is renamed.

I’m running Ubuntu with 32GB of RAM. ulimit -S and ulimit -H are both unlimited.

Thanks in advance!

Topic		Replies	Views
“too many open files” despite streaming with IterableDataset 🤗Datasets	2	64	January 30, 2025
Common Voice 8.0.0 en using all available RAM 🤗Datasets	7	907	August 5, 2022
Could not load common_voice dataset 🤗Datasets	1	271	December 15, 2023
Too many open files on big datasets 🤗Datasets	3	212	September 30, 2024
Common voice dataset 15.0 version release 🤗Datasets	1	1250	October 3, 2023

"Too many open files" when loading Common Voice

Related topics