How to combine pre-trained weights of components from different multimodal LLMs?

tcm03 · January 8, 2025, 4:42am

Hi everyone,

I’m currently doing some research on multimodal LLMs and as you know, an MLLM has multiple models from vision, text, and speech, etc. in it. Most MLLMs have vision/ audio encoder(s) to extract features from images, videos, and audio, and some connection modules to adapt the features to the embedding space of the LLM, and usually only the connection modules are trained, with the pre-trained encoders and LLM frozen.

I’m trying to combine some components (e.g. vision encoders) from an MLLM with the architecture of another MLLM, and these MLLMs have their weights stored in safetensors files of Hugging face repos. So my question is, do we have a way to inspect those safetensors files to know which sets of weights correspond to which components in the MLLM? And a possibly more difficult question is that can we combine parts of the weights from this MLLM’s safetensors (e.g. only the vision encoder’s weights) to the weights in safetensors of the other MLLM?

Thanks in advance.

Alanturner2 · January 8, 2025, 7:22am

Hi , @tcm03

Your research on multimodal LLMs sounds fascinating! And I will give you some tips for your questions.

1. Inspecting safetensors Files

Safetensors files are efficient for storing model weights securely and compactly, but they don’t inherently provide human-readable metadata about what the weights correspond to. To inspect and identify specific components:

Use the safetensors Library: The safetensors library allows you to load and manipulate weights programmatically. You can load the file and print the key names, which often correspond to specific components (e.g., vision.encoder.layer.*, text.encoder.layer.*).
```
from safetensors.torch import safe_open

file_path = "path/to/model.safetensors"
with safe_open(file_path, framework="pt") as f:
    for key in f.keys():
        print(key)
```
This should give you a map of weight names to their corresponding components.
Check Model Documentation: Sometimes the structure of the keys is outlined in the Hugging Face model documentation or associated codebase. If not, examining the configuration files (config.json) in the same repository may help.

2. Combining Weights Across MLLMs

Combining weights from different models can be challenging but feasible if you carefully match architectures:

Extract Specific Weights: You can extract the desired weights (e.g., vision encoder) by filtering relevant keys using the safetensors library.
```
desired_keys = [key for key in f.keys() if key.startswith("vision.encoder")]
weights = {key: f.get_tensor(key) for key in desired_keys}
```
Save these weights into a new safetensors file or load them into another model.
Adapt to Another Model:
- If the architectures align well, you can directly map weights by ensuring the key names and dimensions match.
- If there are structural differences (e.g., different layer sizes), you may need to modify the target model’s architecture or interpolate the weights.
Transfer Learning Consideration: When integrating parts of two models, freezing the encoders and fine-tuning only the connection modules (as you mentioned) is a reasonable approach to adapt the new components.
Toolkits for Integration: Consider using libraries like Hugging Face Transformers or PEFT for model manipulation. These frameworks make it easier to handle modular architectures and combine weights.

Final Thoughts

Carefully validate the performance of the combined model through fine-tuning and evaluation. Also, be cautious about licensing terms when using weights from different repositories.

Hope this helps, and good luck with your research! Feel free to ask if you need further clarification.

Best regards, hope this help!
Alan

system · January 10, 2025, 2:43am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Confused about all the files in a LLM Model Beginners	4	601	June 10, 2025
Loading a safetensors format model using Hugging Face Transformers 🤗Transformers	2	4674	September 13, 2023
Using Trainer to save a Bartforsequenceclassification model Beginners	3	2022	August 13, 2024
AutoModelForCausalLM.from_pretrained refuses to load safetensors weights Intermediate	0	944	December 5, 2023
Timm & HuggingFace Intermediate	0	194	May 16, 2023

How to combine pre-trained weights of components from different multimodal LLMs?

1. Inspecting safetensors Files

2. Combining Weights Across MLLMs

Final Thoughts

Related topics