i’m debugging a script and since its very memory heavy my run gets stuck when i am evaluating in the console.
the only solution is to tweak some things and rerun the script. the problem is that for this i need to load the model which takes a few minutes every time.
is it possible to keep the configuration exactly as is but skip the model weights loading?
1 Like
The actual weights are loaded with the from_pretrained()
class method. If you are only interested in the model’s skeleton, I’d do something like this:
from transformers import AutoConfig, AutoModelForCausalLM
model_id = "gpt2" # or whatever model you plan to use
model = AutoModelForCausalLM.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(type(model)) # then you see the model class: GPT2LMHeadModel
from transformers import GPT2LMHeadModel
model2 = GPT2LMHeadModel(config)
If you print out both architectures (just typing model
or model2
), you’ll see that they are exactly the same, the only difference being that the weights have been randomly initialized in the second case.
1 Like
it seems like this option does not retrieve the exact same config as from_pretrained. here is the example with my code, some elements are taken from the Llava repo
from transformers import AutoConfig
config = AutoConfig.from_pretrained('liuhaotian/llava-v1.5-13b')
model = LlavaLlamaForCausalLM(config)
File "/llava/model/language_model/llava_llama.py", line 45, in __init__
self.model = LlavaLlamaModel(config)
File "/llava/model/language_model/llava_llama.py", line 38, in __init__
super(LlavaLlamaModel, self).__init__(config)
File "/llava/model/llava_arch.py", line 32, in __init__
super(LlavaMetaModel, self).__init__(config)
File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 956, in __init__
[LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 956, in <listcomp>
[LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 756, in __init__
self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)
File "//lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 299, in __init__
self.attention_dropout = config.attention_dropout
File "//lib/python3.10/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
python-BaseException
return super().__getattribute__(key)
AttributeError: 'LlavaConfig' object has no attribute 'attention_dropout'
the original from_pretrained had additional arguments:
model = LlavaLlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
attn_implementation=attn_implementation,
torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
**bnb_model_from_pretrained_args
)
1 Like
Looks like you’re missing some details in that config… but it’s hard to tell without knowing what’s in attn_implementation
or bnb_model_from_pretrained_args
.
You can manually add to the config whatever you’re missing as follows:
from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM
from transformers import AutoConfig
config = AutoConfig.from_pretrained('liuhaotian/llava-v1.5-13b')
config.update({"attention_dropout": 0.1})
model = LlavaLlamaForCausalLM(config)
After doing this I don’t get your error anymore, but then it complains about a new attribute (“rope_theta” this time). I guess all these missing attributes are in the additional arguments that you pass to the from_pretrained()
method.
1 Like