Hello! I have used Salamndra for training aLLM model recently, but since two weeks ago it doesnt go past this point:
/workspace/.local/lib/python3.11/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
The download then stops on tokenizer.model (4.81 MB, Xet-backed file in this repo). (Hugging Face)
That pattern (small files OK, Xet files stuck at 0%) is typical of issues with:
hf_xet (the Rust client used for Xet storage), or
network / firewall / DNS toward Xet endpoints, or
a corrupted cache entry for that file.
The Salamandra repo itself is fine and widely used in tools like LitGPT, Ollama wrappers, etc., so this is almost certainly not a Salamandra-specific bug. (Hugging Face)
The TRANSFORMERS_CACHE deprecation warning is just informational, not the root cause. (Hugging Face)
Cause 1: hf_xet / Xet backend issues
Background
Salamandra’s large files and tokenizer assets (tokenizer.model, tokenizer.json, safetensors shards) are stored via Xet. (Hugging Face)
Newer huggingface_hub uses hf_xet automatically for Xet-backed files if the package is installed. (Hugging Face)
There are multiple recent issues and forum posts where downloads of Xet-backed files hang at 0% due to hf_xet bugs or old versions. (GitHub)
What to do
Disable Xet in this environment (safest first step):
# Shell
export HF_HUB_DISABLE_XET=1 # see env var docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
Or at the very top of your Python script, before imports:
import os
os.environ["HF_HUB_DISABLE_XET"] = "1"
from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("BSC-LT/salamandra-7b-instruct")
This tells huggingface_hub to avoid Xet entirely and use regular HTTP downloads instead. (Hugging Face)
Uninstall hf-xet if problems persist:
pip uninstall -y hf-xet xet
There is a known issue where older hf-xet versions (for example 1.0.0) pass “available” checks but then fail at runtime, causing hangs. (GitHub)
Cause 2: Network / VPN / firewall blocking Xet endpoints
Background
Xet uses extra domains like cas-bridge.xethub.hf.co, cas-server.xethub.hf.co, etc. (GitHub)
On some VPNs or corporate networks, these domains don’t resolve or are blocked, so Xet downloads never start (progress bar stays at 0%). Similar “stuck at 0% forever” reports exist for large HF downloads behind VPN/firewalls. (GitHub)
Retry the download, ideally with HF_HUB_DISABLE_XET=1 already set so you don’t hit the same path again.
Cause 4: Version combo (hub / transformers / hf_xet)
Background
There are versions where HF_HUB_DISABLE_XET is buggy (e.g. huggingface_hub 0.34.1 in one reported issue), or where progress bars are misleading. (GitHub)