How to move cache between computers

I have a server equipped with GPUs without internet access. I would like to run some experiments there, and for that I need to download datasets locally and move the downloaded files on the server.

What is the correct procedure to do that? I just copied the .cache/huggingface/datasets directory hoping it would work, but the library still tries to access the internet. I think this may be related to the fact that some metadata (especially a lock file) in there seems to be tied to the user on my local machine, which is different from the server.

I tried to explicitly pass download_mode="reuse_cache_if_exists", and I also tried to pass data_dir directly, but I did not manage to load the cached dataset directly from disk in any case. An example even just with the mnist dataset would be welcome!

1 Like

Hi! Instead of copying the entire cache directory, use Dataset.save_to_disk locally to save the dataset to a specifc directory and then move only that directory to the server. In the final step, call datasets.load_from_disk on the server to load the dataset from the copied directory.

Additionally, you can speed up the process by using sftp/ssh to move the directory to the server:

dset.save_to_disk(path, fs=fsspec.filesystem("sftp", host=host,  port=port, username=username, password=password))
2 Likes