Cnn_dailymail dataset loading problem with Colab

Ayham · January 1, 2022, 4:55am

The cnn_dailymail dataset was rarely downloaded successfully in the past few days.

import datasets
test_dataset = datasets.load_dataset(“cnn_dailymail”, “3.0.0”, split=“test”)

Most of the time when I try to load this dataset using Colab, it throws a “Not a directory” error:

NotADirectoryError: [Errno 20] Not a directory: ‘/root/.cache/huggingface/datasets/downloads/1bc05d24fa6dda2468e83a73cf6dc207226e01e3c48a507ea716dc0421da583b/cnn/stories’

I really don’t know why and what the exact problem is.

This wastes my time waiting for hours or days until I can load the dataset again.

Please guide me to solve this problem or to save this dataset locally so that next time I load it “when it becomes available” from my drive instead.

Thank you in advance

merve · January 1, 2022, 9:00am

I would either try streaming or clear the cache, mount drive & let it save under ‘/content’.

nielsr · January 2, 2022, 1:34pm

This problem has been reported before, see Unable to load 'cnn_dailymail' dataset · Issue #3465 · huggingface/datasets · GitHub.

This is due to too many downloads on Drive (where the data is hosted), if you try again in a less busy period it will work. The Datasets team is looking into this and will provide a fix (probably, by using a different place to host the data).

Ayham · January 3, 2022, 6:09am

Thank you @merve and @nielsr. Unfortunately @merve, this way didn’t solve the problem.
It seems that this issue is quite complicated as this dataset is not hosted by Huggingface, so we are forced to follow the limits of Google Drive Quota, as @nielsr mentioned.

I hope Hugingface gets the permission to host this dataset or to find some other solution because it’s really wasting our time.

Best Regards.

Ayham · February 17, 2022, 1:02pm

It seems that this copy of the dataset has fixed the problem
@merve @nielsr

Topic		Replies	Views
Load dataset failure Beginners	1	1768	October 26, 2020
Cannot user load_dataset in Google colab 🤗Datasets	6	1973	April 26, 2024
Traceback while loading image dataset 🤗Datasets	1	664	July 20, 2022
Load dataset from HF datasets Beginners	2	714	February 22, 2024
Can't automatically load_dataset due to network 🤗Datasets	1	4845	April 7, 2022

Cnn_dailymail dataset loading problem with Colab

Related topics