Most of the time when I try to load this dataset using Colab, it throws a “Not a directory” error:
NotADirectoryError: [Errno 20] Not a directory: ‘/root/.cache/huggingface/datasets/downloads/1bc05d24fa6dda2468e83a73cf6dc207226e01e3c48a507ea716dc0421da583b/cnn/stories’
I really don’t know why and what the exact problem is.
This wastes my time waiting for hours or days until I can load the dataset again.
Please guide me to solve this problem or to save this dataset locally so that next time I load it “when it becomes available” from my drive instead.
This is due to too many downloads on Drive (where the data is hosted), if you try again in a less busy period it will work. The Datasets team is looking into this and will provide a fix (probably, by using a different place to host the data).
Thank you @merve and @nielsr. Unfortunately @merve, this way didn’t solve the problem.
It seems that this issue is quite complicated as this dataset is not hosted by Huggingface, so we are forced to follow the limits of Google Drive Quota, as @nielsr mentioned.
I hope Hugingface gets the permission to host this dataset or to find some other solution because it’s really wasting our time.