Confused about all the files in a LLM Model

ModernNoob · November 29, 2024, 4:14am

For example in mistralai/Mistral-7B-Instruct-v0.3, there is :

model-00001-of-00003.safetensors

model-00002-of-00003.safetensors

model-00003-of-00003.safetensors

and there is also the consolidated.safetensors which looks to be a “concatentation” of all the other safetensors. What are all these safetensors for? Which one is actually the model that I want to use if that makes any sense? Do I need all the safetensors for finetuning my own model?

John6666 · November 29, 2024, 4:25am

To put it simply, the file is too large to handle over a network, so it is just split up. If you are using it with a library provided by HF, etc., it will be automatically concatenated, so there is no need to worry too much. If you want to do things on your own, please refer to the page below.

Do I need all the safetensors for finetuning my own model?

Yes.

ModernNoob · November 29, 2024, 4:51am

I see. If HF is going to concatenate the sharded model anyways, why should I download consolidated.safetensors? Or am I misunderstanding something?

John6666 · November 29, 2024, 5:40am

The linking is done by the program in memory, and the files are usually left divided. As long as you remember that HF’s model is based on folders, you won’t make any mistakes.

I don’t know the background of this, as I haven’t been on HF for that long, but I think it was easier to manage HF’s disks and git at the time if they were divided into 5GB or 10GB units. Also, if you make a mistake with a 5GB transfer, you can just download it again, but if you make a mistake with a 100GB transfer, you’ve lost 100GB. Now there are models with 50GB partitions, so I think it’s probably fine even if it’s bigger, but well, there’s not much point in changing something that’s working.

NotTheStallion · June 10, 2025, 11:40am

I would also like to add a small comment for people who might land in this by accident.
The sharding of tensors works also if you have a local cluster and want to fill them up to the max with safetensor files. I found myself needing a more granular set of safetensors, so used a little code that resplits/reshards safetensors to how many parts I want.

Just in case you need it, here is the repo i used : GitHub - NotTheStallion/reshard-safetensors: This repo helps you understand how safetensors are structured to store different layers of an LLM and re-shard/re-chunk safetensors files even if they don't fit in the GPU.. ( No Autoclass )

Model parallelism is really useful when your model can’t hold on a single GPU.

Topic		Replies	Views
How to Export a LLM as a .bin instead of Safetensors 🤗Transformers	0	933	December 1, 2023
Safetensors model file Beginners	1	1616	November 30, 2023
.bin to safetensors without publishing it on hub? Beginners	7	8775	May 13, 2025
Resetting base_model breaks shared tensors for safetensors Intermediate	3	39	March 14, 2025
Loading a safetensors format model using Hugging Face Transformers 🤗Transformers	2	4707	September 13, 2023

Confused about all the files in a LLM Model

Related topics