How to move cache between computers

I have a server equipped with GPUs without internet access. I would like to run some experiments there, and for that I need to download datasets locally and move the downloaded files on the server.

What is the correct procedure to do that? I just copied the .cache/huggingface/datasets directory hoping it would work, but the library still tries to access the internet. I think this may be related to the fact that some metadata (especially a lock file) in there seems to be tied to the user on my local machine, which is different from the server.

I tried to explicitly pass download_mode="reuse_cache_if_exists", and I also tried to pass data_dir directly, but I did not manage to load the cached dataset directly from disk in any case. An example even just with the mnist dataset would be welcome!

Hi! Instead of copying the entire cache directory, use Dataset.save_to_disk locally to save the dataset to a specifc directory and then move only that directory to the server. In the final step, call datasets.load_from_disk on the server to load the dataset from the copied directory.

Additionally, you can speed up the process by using sftp/ssh to move the directory to the server:

dset.save_to_disk(path, fs=fsspec.filesystem("sftp", host=host,  port=port, username=username, password=password))