Dataset not downloading

Hi there,
I am trying to create a new dataset: mehdie/sefaria 路 Datasets at Hugging Face
When I try to use it to train a tokenizer, the data itself (data directory) does not get downloaded. I get 0 files and 0 records. Did I do something wrong? I have a custom _split_generators function. Maybe something needs to be done there?
Thanks
Tomer

Hi ! In your script you seem to use glob but it鈥檚 not necessary.

You can use dl_manager.download() to download a list of parquet files, and it will return the list of downloaded parquet file paths

Hi,
The dL_manager.download() function returns a list of text files that are around 132bytes and contain Things like this:

version https://git-lfs.github.com/spec/v1
oid sha256:08f7e86610f17d1addd6999e39c969aea926902d0cf616e8d5bd90f41ba124d1
size 7518478

Interestingly, in the same directory each of these files is accompanied by a file with the same name and a .json extension that contains the URL to one of my missing files:

{"url": "https://huggingface.co/datasets/mehdie/sefaria/raw/main/data/Chasidut_english.parquet", "etag": null}

The URLs were not pointing to the file itself, but to its git lfs metadata file. You should use

https://huggingface.co/datasets/mehdie/sefaria/resolve/main/data/Chasidut_english.parquet

instead of

https://huggingface.co/datasets/mehdie/sefaria/raw/main/data/Chasidut_english.parquet