RuntimeError: CAS service error when pushing large dataset to Hugging Face Hub

Hi everyone,

I’m trying to upload a dataset (~189k samples) to the Hugging Face Hub using the push_to_hub() method from the datasets library.

But during the push step, I get this error:

RuntimeError: Data processing error: CAS service error : ReqwestMiddleware Error: Request error: error sending request for url (https://cas-server.xethub.hf.co/xorb/default/84789a9e15313d792d35037979d8b278afad9f55bf22fe31104946e6e307bed6)

Some extra details:

  • Dataset is sharded into 2 parts.
  • I’m using the latest version of the datasets library.
  • Internet connection is stable.
  • No issues until the upload starts.

Has anyone else experienced this error? Is this a temporary issue on the Hugging Face backend or do I need to do something differently?

Thanks for any help or advice!

1 Like

The symptoms are similar to an ongoing issue. As a workaround, would it be possible to try turning off Xet temporarily?

Hello! Thanks for the report @mahwizzzz

I don’t believe this is related to this issue, that @John6666 linked to, but is potentially something different. While we look into this could you provide a few details?

The full method signature that you’re using? e.g.:

dataset.push_to_hub("<organization>/<dataset_id>", num_shards=1024, private=True ..... )

Information about your environment - OS/packages & versions/etc (if you have the huggingface-cli installed you can just run huggingface-cli env and provide that). At a minimum knowing what operating system you have and what version of hf-xet is installed would be quite helpful.

If you retry and this continues to be an issue, you can try to upload with hf-xet disabled (export HF_HUB_DISABLE_XET=1) to help unblock you. This will fall back to HTTP upload and you can then unset that environment variable after.

1 Like