Hi all, I’m wondering whether anyone has experience with building a dataset builder with rsync and passwords? Unfortunately I don’t have control over where or how the source data is stored and it would be a much better experience if my team can download the dataset faster with the normal download_and_extract
method rather than custom_download
, which downloads with 1 file at a time.
What I have been doing is using a simple function like this to download the files with custom_download
method of the dl_manager
.
def rsync_download(url, dest, username):
Path(dest).mkdir(parents=True, exist_ok=True)
command = ["sshpass", "-fpassword", "rsync", "-auxvL", "--delete", f"{username}@{url}", dest]
subprocess.run(command, check=True)
I looked at fsspec’s doc, and it appears that they just included support for rsync. But I’ve not found a way to use this… have any one figured something out or any alternatives?