LLM model repository file format


I am confused about the format in which llm models are saved in the repositories. I want to use ollama to load my models. I downloaded some .gguf models and it works fine since there is only one file.

I see some models like this one mistralai/Mistral-7B-v0.1 at main that have multiple pytorch_model.bin files. I understand that this split is done by the transformers library when saving the model(sharding).

One of my question is: if I save a pytorch model with .pt format, how can I upload it to the repository in the format provided from the example? where do I get the config file, where do I get the tokenizer file?

And the main question is: how can I convert that repo into one .bin file to be imported in ollama? There are option to have it quantized but I want the full precision model.

Also if you could point me to some documentation about model format would be appreciated. Thank you!