I have a server equipped with GPUs without internet access. I would like to run some experiments there, and for that I need to download datasets locally and move the downloaded files on the server.
What is the correct procedure to do that? I just copied the .cache/huggingface/datasets
directory hoping it would work, but the library still tries to access the internet. I think this may be related to the fact that some metadata (especially a lock file) in there seems to be tied to the user on my local machine, which is different from the server.
I tried to explicitly pass download_mode="reuse_cache_if_exists"
, and I also tried to pass data_dir
directly, but I did not manage to load the cached dataset directly from disk in any case. An example even just with the mnist
dataset would be welcome!