I’m facing a rather weird issue when downloading larger models (> 14B or more than 35-36 GB) from the hub. The download stopps and fails after some time with the following error message:
RuntimeError: Data processing error: CAS service error : IO Error: Too many open files (os error 24)
Before the error occurs xet-core is throwing a lot of warnings, which I can’t really decipher. It looks/feels a bit like, if it’s running in some kind of debug mode. I hence assume that this error is caused by xet-core.
I have tried downloading multiple models (Qwen2.5-72B, Llama-3.3-70B, Mistral-small-24B, Qwen3-8B and Phi4). The error occurs alsways after downloading around 35-36 GB of data.
I’m on macOS 15.5 using Python 3.12, transformers 4.51.3, hf-xet 1.1.3 and huggingface-hub 0.32.4. I tried to download the models from a notebook cell, using both xlm-lm and transformers from_pretraioned(), and via the huggingface-cli.
Has someone else encountered similar issues? Is it possible to avoid using xet and instead use LFS?
Hi @h4rz3rk4s3 - Xet team member here checking in - sorry you’re encountering this! @John6666 is correct, the Too many open files (os error 24) you’re encountering should’ve been addressed in a recent set of releases.
Could you tell me a bit more about your setup? Specifically:
Network speed
Disk setup
Any HF_XET* environment variables (can see these by doing a env | grep "HF_XET")
To answer your question:
Is it possible to avoid using xet and instead use LFS?
There is an environment variable you can set to disable hf-xet and instead download a Xet-backed file using the LFS bridge - see HF_HUB_DISABLE_XET. While this isn’t exactly the same as using LFS, it may help in this specific instance.