How to download a model and run it with Ollama locally?

hopewise · March 13, 2024, 8:30pm

I have just installed Ollama on my Macbook pro, now how to download a model form hugging face and run it locally at my mac ?

hopewise · March 14, 2024, 12:20am

Thank you for your response, is there any other recommend tool to run models locally than Ollama? Why should I use Ollama? Am I on the right track?

KeelyPowers · March 14, 2024, 10:39pm

Certainly! Here are some alternative tools for running models locally:

PyTorch
TensorFlow
ONNX Runtime
OpenVINO
TensorFlow Lite

Each tool offers unique features and advantages, so you can choose based on your specific requirements.

hopewise · March 15, 2024, 7:44am

Thank you for the information, I really appreciate it

Shurrman · April 7, 2024, 10:09am

Can’t find it… Any other options?

Overminded · April 8, 2024, 11:28am

You can use git to clone the model to your local storage.
As long as you have git installed on your computer just use the “git clone” command followed by the URL of the model on hugginface.
For example:
git clone mistralai/Mistral-7B-Instruct-v0.2 · Hugging Face

pistha · April 8, 2024, 8:25pm

So the downloaded model will be in quantized form. Right? Can we use this to fine tune ?

I dont see any config.json file. Any idea ?

alexsafayan · April 20, 2024, 9:59pm

@hopewise You do realize that both of @KeelyPowers 's answers were generated by a LLM… right?

hopewise · April 21, 2024, 5:36am

No! But I am not shocked sense this is huggingface forum!

worstkid92 · May 9, 2024, 3:29am

My I ask,after my downloading models, how can i use ollama to load it?
My downlaoding is in transformer format,with files like config.json,model-00001-of-00002.safetensors , pytorch_model-00003-of-00003.bin special_tokens_map.json tokenizer.model
No modelfile,how can I load it with ollama

nielsr · May 9, 2024, 9:37am

Hi,

Ollama is a wrapper on top of Llama cpp: GitHub - ggerganov/llama.cpp: LLM inference in C/C++, which supports any HF model which can be run from the terminal. I would recommend checking it out, along with LMStudio, which provides a nice UI on top of it, it supports any HF model in the GGUF format.

qdrddr · May 15, 2024, 7:52pm

Here are the steps to convert GGUF models from HF into the Ollama format

qdrddr · May 15, 2024, 7:54pm

Though I have trouble understanding how to do that for the embeddings model.
How can I import Huggingface GGUF Embeddings model to Ollama format if the model card page does not include the Modelfile?

For example, for this embeddings model: [dranger003/SFR-Embedding-Mistral-GGUF · Hugging Face]

Or this one: [GritLM/GritLM-8x7B · Hugging Face]

qdrddr · May 15, 2024, 7:55pm

I read these articles how to convert a model to Ollama format, but they are not clear on the embeddings

Is there a difference how I supposed to create the Model file for Embeddings model vs. the LLM model? Where can I get expected for the model Parameters and Template values for a given model if they are not provided?

mrshaad · May 18, 2024, 1:26pm

To download and run a model with Ollama locally, follow these steps:

Install Ollama: Ensure you have the Ollama framework installed on your machine.
Download the Model: Use Ollama’s command-line interface to download the desired model, for example: ollama pull <model-name>.
Run the Model: Execute the model with the command: ollama run <model-name>.
Configure Settings: Adjust any necessary settings or parameters as required by your specific use case.

Refer to Ollama’s documentation for detailed instructions and additional options.

zele · May 30, 2024, 2:20am

This guy has the exact video you’re looking for. The first half is about Quantized/GGUF models. But around minute mark 4:23 he gets into regular models, like what you’re asking about

MrAR · May 14, 2025, 6:31pm

What about other architectures?

John6666 · May 15, 2025, 2:24am

In Ollama, models supported by Llama.cpp are basically usable, so you can use almost any architecture model in the same way.

One thing to note about recent models is that Qwen 3 seems to perform worse with quantization smaller than Q6, so be careful.

Topic		Replies	Views
How to use hugging face to fine-tune ollama's local model Beginners	7	8106	August 28, 2024
How to run huggingface model on base url Beginners	0	152	January 17, 2025
Lama 3.23b performs great when I download and use using ollama but when I manually download the model or if I use the gguf model by unsloth, it gives me irrelevant response. Please help me out Beginners	9	1351	October 31, 2024
Isn't there a simpler way to run LLMs / models locally? Beginners	3	548	April 28, 2025
Loading a retrained model locally Beginners	2	2429	February 5, 2024

How to download a model and run it with Ollama locally?

Related topics