Recommended tools for managing, downloading, and storing models on lan

I started performing local inference and when trying to use models that are approaching or over 100gb sometimes the file downloads stop or fail. WAN may be problematic sometimes and some downloads when files are not chunked require download entire file over again.

Are there any applications i can run locally to batch manage downloads from hf and maintain storage of hugging face models so that i can have models downloaded and sitting somewhere on my fast LAN ready to be served to any other hosts or vms?

Are there any selfhost local storage repo applications that can cache hf and that would work with hf cache download?

If not, what are recommended methods for storing the large model files so that they remain organized in similar format to hf and can be readily and quickly served to hosts on the LAN?

Simply use a local git server with Git LFS? Or are people simply downloading models from hf to nfs or smb servers and copying the files from said nfs or smb to other hosts when necessary?

tia

1 Like

Some options…


Use one of three patterns: 1) a shared Hugging Face cache on NFS/SMB, 2) a real proxy cache (Artifactory or Nexus), 3) a lightweight self-host mirror. Add resumable downloaders (hf_transfer, hf_xet, or aria2c). Keep Hub layout to stay compatible with all clients.

1) Shared HF cache over LAN (fast, simple)

  • Put the cache on your NAS and point every host to it. The Hub cache is versioned and supports symlinks; Windows falls back without symlinks and uses more disk. Use offline mode after prefetching. (Hugging Face)

  • Preload models by commit with snapshot_download and expose “pretty” trees via symlinks to avoid duplicates. (Hugging Face)


# docs: https://huggingface.co/docs/huggingface_hub/en/guides/cli

pip install -U "huggingface_hub[cli,hf_transfer]" # https://github.com/huggingface/hf_transfer

export HF_HUB_ENABLE_HF_TRANSFER=1

export HF_HOME=/srv/hf # shared mount (NFS/SMB)

# optional: use local_dir symlinks to a clean folder tree

python - <<'PY'

# docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/file_download

from huggingface_hub import snapshot_download

snapshot_download(

"meta-llama/Llama-3.1-70B",

revision="<commit-sha>", # pin for reproducibility

local_dir="/models/llama-3.1-70b", # human-friendly path

local_dir_use_symlinks=True,

)

PY

# after warming the cache:

export HF_HUB_OFFLINE=1 # serve from LAN only

2) Proxy cache (best UX for teams)

  • JFrog Artifactory: native “Hugging Face” repo type. Create a remote HF repo and point clients at it with HF_ENDPOINT. WAN downloads happen once; LAN serves everything else. Docs updated 2025. (JFrog)

  • Sonatype Nexus: “huggingface (proxy)” supported since Feb 2025; same HF_ENDPOINT client knob. Known open issue with some Xet-backed large files as of May 22 2025. Test your models. (help.sonatype.com)


# Point HF clients at your proxy

# Artifactory docs: https://jfrog.com/help/r/jfrog-artifactory-documentation/hugging-face-repositories

# Nexus docs: https://help.sonatype.com/en/hugging-face-repositories.html

export HF_ENDPOINT="https://repo.example.com/api/huggingface/hub"

hf snapshot-download mistralai/Mixtral-8x7B-Instruct --revision <sha>

3) Lightweight self-host mirrors (DIY)

  • Olah: on-demand HF mirror with block-level caching. Point clients via HF_ENDPOINT. Good for labs without a full artifact manager. (GitHub)

  • Other community mirrors exist (hf-mirror, light-hf-proxy); evaluate support and maintenance risk. (GitHub)

Make big downloads reliable

  • Prefer sharded weights. HF has a hard 50 GB per file limit; large repos shard by design to avoid restart-from-zero failures. (Hugging Face)

  • Use hf_transfer for high-throughput, resumable transfers; enabled via env var. (GitHub)

  • Xet-backed repos: install hf_xet (bundled since huggingface_hub 0.32.0). You get chunk-level dedupe and resilient partial reuse; tune or disable the local Xet chunk cache with env vars. (Hugging Face)

  • For one huge file, use aria2c segmented downloads with resume. (Gist)


# Xet knobs (docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables)

export HF_XET_CHUNK_CACHE_SIZE_BYTES=20000000000 # 20GB, or 0 to disable

# aria2 manual: https://aria2.github.io/manual/en/html/aria2c.html

aria2c -c -x12 -s12 --min-split-size=10M "https://huggingface.co/.../model-00001-of-00012.safetensors"

Storage layout and serving

  • Keep the Hub cache shape. Set HF_HOME or HUGGINGFACE_HUB_CACHE; use local_dir_use_symlinks=True when you need a clean model folder without duplicating blobs. (Hugging Face)

  • Share that cache read-only over NFS/SMB. This is a common HPC pattern; many users symlink their ~/.cache/huggingface/hub to a shared mount. (Hugging Face Forums)

  • Model servers: vLLM defaults to the HF cache; you can set a download directory or env var to point it at your shared path. (docs.vllm.ai)

When Git LFS or plain copies are a bad fit

  • Cloning giant repos via Git LFS is slower and adds Git metadata; HF recommends programmatic downloads (hf_hub_download / snapshot_download) and cache reuse instead. (Hugging Face)

Known caveats (2025-10-13)

  • Nexus’s HF proxy may fail on some Xet-backed large files. Validate with your exact models; Artifactory works today. (GitHub)

  • Symlink behavior varies on Windows; cache still works but uses more space. (Hugging Face)


Minimal rollouts

A) Team, low ops: shared cache

  1. Export /srv/hf over NFS.

  2. HF_HOME=/srv/hf on all hosts.

  3. Warm with snapshot_download --revision <sha>. (Hugging Face)

B) Team, governance: proxy cache

  1. Create Artifactory HF remote repo.

  2. HF_ENDPOINT=https://artifactory/....

  3. Pull once, serve over LAN. (JFrog)

C) DIY: Olah mirror

  1. Deploy Olah.

  2. HF_ENDPOINT=http://olah:8090. (GitHub)


Similar cases

  • Shared NFS cache across users discussed and used in practice on HF forums and Stack Overflow. (Hugging Face Forums)

  • HPC guidance advises relocating HF caches to shared storage. (docs.alliancecan.ca)

  • Enterprise teams proxy HF via Artifactory/Nexus to centralize model pulls. (JFrog)


Curated references

Official HF docs

  • Cache internals, env vars, downloading, symlink notes, offline mode. (Hugging Face)

  • Storage limits (50 GB/file). (Hugging Face)

Proxy repos

Mirrors and tools

  • Olah mirror, hf-mirror, light-hf-proxy. (GitHub)

  • hf_transfer and aria2. (GitHub)

Xet backend

  • Using hf_xet, env tuning, and xet-core overview. (Hugging Face)

@John6666 which options have you used, do you have any opinions on them, what options did you settle on and are you continuing to use them?

Can anyone else also comment on how you are sharing models across several users and/or hosts?

1 Like

I only use one PC, so I’ve never used any of them…