How to download a model and run it with Ollama locally?

I have just installed Ollama on my Macbook pro, now how to download a model form hugging face and run it locally at my mac ?

Thank you for your response, is there any other recommend tool to run models locally than Ollama? Why should I use Ollama? Am I on the right track?

Certainly! Here are some alternative tools for running models locally:

  1. PyTorch
  2. TensorFlow
  3. ONNX Runtime
  4. OpenVINO
  5. TensorFlow Lite

Each tool offers unique features and advantages, so you can choose based on your specific requirements.

1 Like

Thank you for the information, I really appreciate it :smiling_face:

Can’t find it… Any other options?

You can use git to clone the model to your local storage.
As long as you have git installed on your computer just use the “git clone” command followed by the URL of the model on hugginface.
For example:
git clone mistralai/Mistral-7B-Instruct-v0.2 · Hugging Face

So the downloaded model will be in quantized form. Right? Can we use this to fine tune ?

I dont see any config.json file. Any idea ?

@hopewise You do realize that both of @KeelyPowers 's answers were generated by a LLM… right?

No! But I am not shocked sense this is huggingface forum! :sweat_smile:

My I ask,after my downloading models, how can i use ollama to load it?
My downlaoding is in transformer format,with files like config.json,model-00001-of-00002.safetensors , pytorch_model-00003-of-00003.bin special_tokens_map.json tokenizer.model
No modelfile,how can I load it with ollama

Hi,

Ollama is a wrapper on top of Llama cpp: GitHub - ggerganov/llama.cpp: LLM inference in C/C++, which supports any HF model which can be run from the terminal. I would recommend checking it out, along with LMStudio, which provides a nice UI on top of it, it supports any HF model in the GGUF format.

Here are the steps to convert GGUF models from HF into the Ollama format

1 Like

Though I have trouble understanding how to do that for the embeddings model.
How can I import Huggingface GGUF Embeddings model to Ollama format if the model card page does not include the Modelfile?

For example, for this embeddings model: [dranger003/SFR-Embedding-Mistral-GGUF · Hugging Face]

Or this one: [GritLM/GritLM-8x7B · Hugging Face]

1 Like

I read these articles how to convert a model to Ollama format, but they are not clear on the embeddings

Is there a difference how I supposed to create the Model file for Embeddings model vs. the LLM model? Where can I get expected for the model Parameters and Template values for a given model if they are not provided?

To download and run a model with Ollama locally, follow these steps:

  1. Install Ollama: Ensure you have the Ollama framework installed on your machine.
  2. Download the Model: Use Ollama’s command-line interface to download the desired model, for example: ollama pull <model-name>.
  3. Run the Model: Execute the model with the command: ollama run <model-name>.
  4. Configure Settings: Adjust any necessary settings or parameters as required by your specific use case.

Refer to Ollama’s documentation for detailed instructions and additional options.

This guy has the exact video you’re looking for. The first half is about Quantized/GGUF models. But around minute mark 4:23 he gets into regular models, like what you’re asking about

1 Like