Understanding where model weights are stored for research project on AI openness

FabianFle · January 24, 2025, 1:50am

Hello there,

I have a question regarding how model weights are generally stored on Hugging Face. I am conducting research on the openness of AI models and understanding when model weights are published and what effects this might have is crucial. One aspect of my project therefore involves exploring the different file formats in which model weights can be stored (e.g., .pt files, .bin files, or other formats).

If anyone has pointers or resources, I would greatly appreciate it. I realize this is a broad set of questions, so thank you in advance for your and guidance.

Thank you again, and apologies for my ignorance regarding this matter.

Best,
Fabian

John6666 · January 24, 2025, 5:47am

Hello. Since it’s a broad topic, let’s start by talking about the format. There are no particular restrictions on the file format of the weights of the models uploaded to Hugging Face. People are doing what they want, from Python pickles like you mentioned to quantized GGUF files, etc.

However, the safetensors file format is usually recommended by Hugging Face, and it has been widely used for the past few years. It is similar to Python’s pickle in terms of how it is used, but it is a file format that eliminates security risks as much as possible. You can think of it as containing only metadata and a state_dict.
Because this file format is recommended, it also has an advantage when searching in HF, for example, you can get metadata with relatively low overhead.

FabianFle · January 24, 2025, 6:31pm

Thank you for your response. This helps me a lot!

Victorano · January 31, 2025, 11:56am

In addition to @John6666 response, the Python Pickles format has security vulnerability because malicious models can be uploaded using that format and it may remain undetectable on hf, also ggufs are mostly quantized version of the original safetensor model.

Topic		Replies	Views
Get a unsafe file after upload model weight Models	0	215	October 24, 2023
Converting weights to .safetensors with HF format -> CLIP-L is ruined. Why? Beginners	18	1250	September 21, 2024
Assistance Required for fudan-generative-ai/hallo2 Implementation and Model Weight Issues Models	1	28	October 23, 2024
Hugging Face to GGUF Conversion Broken? 🤗Hub	1	5265	February 11, 2024
Malicious code at top of huggingface leaderboard? Models	0	96	July 27, 2024

Understanding where model weights are stored for research project on AI openness

Related topics