LLM model repository file format

denisstefan · December 4, 2023, 12:27pm

Hello.

I am confused about the format in which llm models are saved in the repositories. I want to use ollama to load my models. I downloaded some .gguf models and it works fine since there is only one file.

I see some models like this one mistralai/Mistral-7B-v0.1 at main that have multiple pytorch_model.bin files. I understand that this split is done by the transformers library when saving the model(sharding).

One of my question is: if I save a pytorch model with .pt format, how can I upload it to the repository in the format provided from the example? where do I get the config file, where do I get the tokenizer file?

And the main question is: how can I convert that repo into one .bin file to be imported in ollama? There are option to have it quantized but I want the full precision model.

Also if you could point me to some documentation about model format would be appreciated. Thank you!

Topic		Replies	Views
How to make a model file for Ollama? Models	1	162	April 24, 2025
How to download a model and run it with Ollama locally? Beginners	17	116912	May 15, 2025
Lama 3.23b performs great when I download and use using ollama but when I manually download the model or if I use the gguf model by unsloth, it gives me irrelevant response. Please help me out Beginners	9	1344	October 31, 2024
Load model efficiently using llama.cpp Models	0	227	September 6, 2024
How to convert llama Model (meta-llama/Llama-3.2-1B) to .mar file Beginners	1	72	February 18, 2025

LLM model repository file format

Related topics