Hi. I’m having troubles loading the dolma dataset.
After downloading all files I try to load like this:
dataset = load_dataset("allenai/dolma", split="train", data_dir='/nvmedata/dolma_files/', cache_dir="/nvmedata/cache")
However, invariably, the load_dataset function tries to download more data… it would seem like it does not want to use the files already dowloaded.