Load dataset from files already downloaded

Hi. I’m having troubles loading the dolma dataset.

After downloading all files I try to load like this:

dataset = load_dataset("allenai/dolma", split="train", data_dir='/nvmedata/dolma_files/', cache_dir="/nvmedata/cache")

However, invariably, the load_dataset function tries to download more data… it would seem like it does not want to use the files already dowloaded.

Files are all in one folder like this: