As someone who has hundreds of gigabytes of model weights and a handful of different local AI apps that I use, an annoyance I frequently have is that every app has its own registry and storage location, which leads to multiple duplicate model files eating up disk space. As most of those models come from Hugging Face anyway, I wanted to see if I could make Hugging Face my one source of truth location for all my models.
So, I created UMR, the Unified Model Registry for all your local AI Apps!
It lets you add one canonical copy of whatever model you’re using, straight from Hugging Face Cache, and then link it to all your tools like Ollama, LM Studio, Jan, or llama.cpp. Linking uses the same model that you already downloaded, doesn’t require extra storage, and is super fast.
How to Set it Up
See the second image for a more graphical step-by-step.
- Install UMR via NPM or your favorite JS package manager:
npm i -g umr-cli
- Add any Hugging Face GGUF model that you want. This CLI will let you interactively choose a quant file if applicable. After it finishes downloading, you’ll get its UMR Model ID. HF models already available on your device will be added straight from HF Cache.
umr add hf ggml-org/gemma-4-E2B-it-GGUF
- Use that model ID to add it to any supported local AI app. For example, for the q8 version, this is what it would look like!
# Link the model to Ollama
umr link ollama gemma-4-e2b-it-q8-0
# Link the model to LM Studio
umr link lmstudio gemma-4-e2b-it-q8-0
# Link the model to Jan
umr link jan gemma-4-e2b-it-q8-0
Now, the model should be available to use in each of those platforms respectively!
If you want to access the GGUF file managed by UMR directly, you can use the show --path command. For example, let’s use it with llama.cpp:
# Run llama.cpp with a UMR-managed model
llama-cli -m "$(umr show gemma-4-e2b-it-q8-0 --path)"
How Does It Work?
UMR itself does not necessarily store your model. It simply knows where to find them after you register them. For example, once you add hf, the model is still downloaded/fetched from Hugging Face Cache. UMR just takes note of where it is (in HF Cache).
You can also add a model manually with umr add ./path/to/file.gguf, which will clone it locally into UMR’s own store.
Then, when you link to a Client app like LM Studio, UMR intelligently chooses between hardlinking the model file into the app’s own store, or simply points the app at UMR’s managed path, making the process super fast and use no extra storage.
Feedback and Contribution
I’m open to feedback, including new features/client apps you want to see me integrate, new model sources you want to see me add, and questions!
UMR is also completely Open Source on GitHub: GitHub - EvanZhouDev/umr: The Unified Model Registry for all your local AI apps. · GitHub
Feel free to contribute!