Problem loading HuggingFaceFW/fineweb-edu-score-2 dataset: Too Many Requests

Hello,

I am encountering an issue while loading the HuggingFaceFW/fineweb-edu-score-2 dataset for model training within a Bittensor project. I am receiving the 429 Too Many Requests error, even though I have taken the following measures to reduce the load on the server:

  • Using --pages_per_epoch 1: I have limited the number of pages loaded per epoch to 1.
  • Implemented delays: I have added time.sleep(10) (or more) before each call to requests.get in the dataset.py file.
  • Verified internet connection: I have a stable internet connection with good ping to datasets-server.huggingface.co.
  • Authenticated via huggingface-cli login: I have successfully authenticated with Hugging Face via the command line.

I am using the following code to load the dataset:

loader = pt.dataset.SubsetFineWebEdu2Loader(
    ...
    num_pages=config.pages_per_epoch,
    ...
)

python

Full traceback of the error:

[Insert the full traceback of the error here]

I suspect that the issue may be related to very strict rate limits on the Hugging Face Datasets server.

Could you please assist me in resolving this issue? Perhaps you could lift the rate limit for my account or suggest other solutions.

Thank you for your assistance!

1 Like

With 35TB, it’s not impossible to download, but we don’t want to…:sweat_smile:

It seems that there are cases of the following issues.

lhoestq
on Jan 10, 2025
Hi ! This is due to your old version of datasets which calls HF with expand=True, an option that is strongly rate limited.
Recent versions of datasets don’t rely on this anymore, you can fix your issue by upgrading datasets :slight_smile:
pip install -U datasets
You can also get maximum HF availability on your compute nodes with HF Enterprise (see network security features)