How to initialize a model with random weights

i’m debugging a script and since its very memory heavy my run gets stuck when i am evaluating in the console.

the only solution is to tweak some things and rerun the script. the problem is that for this i need to load the model which takes a few minutes every time.

is it possible to keep the configuration exactly as is but skip the model weights loading?

1 Like

The actual weights are loaded with the from_pretrained() class method. If you are only interested in the model’s skeleton, I’d do something like this:

from transformers import AutoConfig, AutoModelForCausalLM

model_id = "gpt2" # or whatever model you plan to use
model = AutoModelForCausalLM.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(type(model)) # then you see the model class: GPT2LMHeadModel

from transformers import GPT2LMHeadModel
model2 = GPT2LMHeadModel(config)

If you print out both architectures (just typing model or model2), you’ll see that they are exactly the same, the only difference being that the weights have been randomly initialized in the second case.

1 Like

it seems like this option does not retrieve the exact same config as from_pretrained. here is the example with my code, some elements are taken from the Llava repo

from transformers import AutoConfig
config = AutoConfig.from_pretrained('liuhaotian/llava-v1.5-13b')
model = LlavaLlamaForCausalLM(config)

File "/llava/model/language_model/llava_llama.py", line 45, in __init__
    self.model = LlavaLlamaModel(config)
  File "/llava/model/language_model/llava_llama.py", line 38, in __init__
    super(LlavaLlamaModel, self).__init__(config)
  File "/llava/model/llava_arch.py", line 32, in __init__
    super(LlavaMetaModel, self).__init__(config)
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 956, in __init__
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 956, in <listcomp>
    [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 756, in __init__
    self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)
  File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 299, in __init__
    self.attention_dropout = config.attention_dropout
  File "//lib/python3.10/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
python-BaseException
    return super().__getattribute__(key)
AttributeError: 'LlavaConfig' object has no attribute 'attention_dropout'

the original from_pretrained had additional arguments:

model = LlavaLlamaForCausalLM.from_pretrained(
                    model_args.model_name_or_path,
                    cache_dir=training_args.cache_dir,
                    attn_implementation=attn_implementation,
                    torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
                    **bnb_model_from_pretrained_args
                )
1 Like

Looks like you’re missing some details in that config… but it’s hard to tell without knowing what’s in attn_implementation or bnb_model_from_pretrained_args.

You can manually add to the config whatever you’re missing as follows:

from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM
from transformers import AutoConfig

config = AutoConfig.from_pretrained('liuhaotian/llava-v1.5-13b')
config.update({"attention_dropout": 0.1})

model = LlavaLlamaForCausalLM(config)

After doing this I don’t get your error anymore, but then it complains about a new attribute (“rope_theta” this time). I guess all these missing attributes are in the additional arguments that you pass to the from_pretrained() method.

1 Like