How to initialize a model with random weights

GonRos22 · October 22, 2024, 4:55pm

i’m debugging a script and since its very memory heavy my run gets stuck when i am evaluating in the console.

the only solution is to tweak some things and rerun the script. the problem is that for this i need to load the model which takes a few minutes every time.

is it possible to keep the configuration exactly as is but skip the model weights loading?

mapama247 · October 25, 2024, 12:11pm

The actual weights are loaded with the from_pretrained() class method. If you are only interested in the model’s skeleton, I’d do something like this:

from transformers import AutoConfig, AutoModelForCausalLM

model_id = "gpt2" # or whatever model you plan to use
model = AutoModelForCausalLM.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(type(model)) # then you see the model class: GPT2LMHeadModel

from transformers import GPT2LMHeadModel
model2 = GPT2LMHeadModel(config)

If you print out both architectures (just typing model or model2), you’ll see that they are exactly the same, the only difference being that the weights have been randomly initialized in the second case.

GonRos22 · October 26, 2024, 12:07pm

it seems like this option does not retrieve the exact same config as from_pretrained. here is the example with my code, some elements are taken from the Llava repo

from transformers import AutoConfig
config = AutoConfig.from_pretrained('liuhaotian/llava-v1.5-13b')
model = LlavaLlamaForCausalLM(config)

File "/llava/model/language_model/llava_llama.py", line 45, in __init__
    self.model = LlavaLlamaModel(config)
  File "/llava/model/language_model/llava_llama.py", line 38, in __init__
    super(LlavaLlamaModel, self).__init__(config)
  File "/llava/model/llava_arch.py", line 32, in __init__
    super(LlavaMetaModel, self).__init__(config)
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 956, in __init__
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 956, in <listcomp>
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 756, in __init__
    self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 299, in __init__
    self.attention_dropout = config.attention_dropout
  File "//lib/python3.10/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
python-BaseException
    return super().__getattribute__(key)
AttributeError: 'LlavaConfig' object has no attribute 'attention_dropout'

the original from_pretrained had additional arguments:

model = LlavaLlamaForCausalLM.from_pretrained(
                    model_args.model_name_or_path,
                    cache_dir=training_args.cache_dir,
                    attn_implementation=attn_implementation,
                    torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
                    **bnb_model_from_pretrained_args
                )

mapama247 · October 28, 2024, 12:19pm

Looks like you’re missing some details in that config… but it’s hard to tell without knowing what’s in attn_implementation or bnb_model_from_pretrained_args.

You can manually add to the config whatever you’re missing as follows:

from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM
from transformers import AutoConfig

config = AutoConfig.from_pretrained('liuhaotian/llava-v1.5-13b')
config.update({"attention_dropout": 0.1})

model = LlavaLlamaForCausalLM(config)

After doing this I don’t get your error anymore, but then it complains about a new attribute (“rope_theta” this time). I guess all these missing attributes are in the additional arguments that you pass to the from_pretrained() method.

Topic		Replies	Views
Loading a trained model gives an error that weights are randomly initialized 🤗Transformers	0	474	June 6, 2023
Differences between Config.from_pretrained and Model.from_pretrained 🤗Transformers	1	1125	July 20, 2021
Trainer API weights initialization 🤗Transformers	2	76	February 10, 2025
Why `from_pretrained` method still works when model config is mismatched? Beginners	1	315	January 26, 2021
Loading model from_pretrained with dummy parameter 🤗Transformers	0	447	April 16, 2023

How to initialize a model with random weights

Related topics