I have faced the same issue when replicating bigger better faster.
As I could understand, applying weights to a model only before you train it. I didn’t test if it is possible to overwrite initializations before training.
You must provide the specific layers you want to reinitialize to your model again, and you should initialize this layers before passing to the model.
model = AutoModelForCausalLM.from_config(config)
new_llama_block = LLaMa_Block() or something like that
new_llama_block.apply(_init_weights)
model.blocks[-1] = new_llama_block