Using local dataset without changing cache

Hi! load_dataset saves a generated dataset in the Arrow format to be able to memory-map it later, so this is the expected behavior. If you are not interested in random access, you can pass streaming=True to get an IterableDataset that doesn’t require cache files.

1 Like