Load_datasets is extremely slow in loading HF datasets

Hello,
I hope someone can help me to figure out the weird issue with load_datasets on my PC (Ubuntu LTS 22.04).
It took a lot of times to just load a dataset from Huggingface like “imdb”, “PolyAI/Banking77”
It did not raise errors (as you can see the completion status in the below screenshots), but the time to complete is surely abnormal (see the below screenshot):

  • 40 minutes for a light dataset to load at the first time
  • Retry 2nd time with the same dataset also took 11 minutes
  • on my Mac laptop, this just takes under 1 minutes (with same wifi)
  • I tried to reinstall datasets (version 2.15.0) and many ways but it does not fix this issue. I tested internet speed and it is absolutely normal (e.g., downloading models from HF is normal). Note that loading local datasets is working normally.

Thank you!

Hi ! I just tried on colab and it’s quite fast now (30sec for imdb, 5sec for PolyAI/banking77). Maybe it was a transient issue ?

The raw data files of those datasets are hosted on GitHub. Maybe their hosting was slow yesterday somehow