Hi. I want to train LLaMA from scratch.
I don’t want to finetune it, but I wanna try pretrain it from scratch and see how it goes.
How could I do it?
Thanks
Hi. I want to train LLaMA from scratch.
I don’t want to finetune it, but I wanna try pretrain it from scratch and see how it goes.
How could I do it?
Thanks
Generally, you initialize the model with random weights as shown here and then train the model like any other.
Except you can’t. To understand why, please check Table 1 and Table 15 in the LLaMa paper. However, I’d really like to hear back from you if you actually can train LLaMa from scratch.
Thanks a lot
I just wanted to know how to pre-train LLaMA, and try it for fun.
I did not intended to actually use the custom pre-trained model.
Updated link to the LLaMA paper: https://arxiv.org/pdf/2302.13971.pdf
I am also interested in initializing empty Llama, just for the sake of faster development time, instead of loading full llama each time
I have a small ‘Llama’ model (~1M parameters) by changing the config and trying to train it with my data. However, the model performs terribly on a small dataset(cannot overfit), and I don’t know why. Here is the link to my problem: Failed to train Llama model
If possible, can I have a small demo code on how to train it?
For those who want to build Llama from scratch, this article might be quite useful.
I tried just reinitializing the layers directly but it didn’t work:
import torch
from transformers import AutoModelForCausalLM, AutoConfig
import torch.nn as nn
def main_reinit_model():
"""
ref: https://stackoverflow.com/questions/76971761/how-to-adapt-llama-v2-model-to-less-than-7b-parameters
ref: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L721
ref: https://chat.openai.com/c/977d0cb0-b819-48ac-be5c-6e482ad5e518
"""
print('Starting to reinitialize the model...')
# Load the pretrained LLaMA v2 config
config = AutoConfig.from_pretrained("meta-llama/Llama-2-7b-hf")
# print(f'config: {config} {type(config)}')
# Print the original number of parameters
model = AutoModelForCausalLM.from_config(config)
# put model on device cuda
model = model.to('cuda')
# print the model's device
print(f'{model.device=}')
# print(f'{model=}')
# print("Original number of parameters:", sum(p.numel() for p in model.parameters()))
# go through all parameters and compute the l1 norm and sum it then print it
norm_model = sum(p.norm(1) for p in model.parameters())
# loop through modules of model and reinitialize weights with normal_mean, 0.02
print(f'{norm_model=}')
"""
go through model and print all laters
"""
# model.init_weights() # didn't work
# model._init_weights(module) # didn't work needs module
# for name, param in model.named_parameters():
# model._init_weights(param)
# model.post_init()
reinitialize_weights(model)
# model._initialize_weights(module) # didn't work needs module
# for name, param in model.named_parameters():
# print(f'{name=} {param.shape=}')
norm_model = sum(p.norm(1) for p in model.parameters())
print(f'{norm_model=}')
def reinitialize_weights(model) -> None:
for module in model.modules():
if isinstance(module, nn.Linear):
nn.init.normal_(module.weight, mean=0, std=0.02)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
def _init_weights(self, module):
std = self.config.initializer_range
if isinstance(module, nn.Linear):
module.weight.data.normal_(mean=100.0, std=std)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, nn.Embedding):
module.weight.data.normal_(mean=0.0, std=std)
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
def main_generate_smaller_model():
"""
ref: https://stackoverflow.com/questions/76971761/how-to-adapt-llama-v2-model-to-less-than-7b-parameters
"""
print('Starting to reinitialize the model...')
# Load the pretrained LLaMA v2 config
config = AutoConfig.from_pretrained("meta-llama/Llama-2-7b-hf")
print(f'config: {config} {type(config)}')
# Print the original number of parameters
model = AutoModelForCausalLM.from_config(config)
print("Original number of parameters:", sum(p.numel() for p in model.parameters()))
# Modify the config to reduce size
config.hidden_size = 2048
config.num_hidden_layers = 12
# Create new smaller model from modified config
smaller_model = AutoModelForCausalLM.from_config(config)
print("New number of parameters:", sum(p.numel() for p in smaller_model.parameters()))
if __name__ == '__main__':
import time
start = time.time()
# main_generate_smaller_model()
main_reinit_model()
print('Done!\a\a\a')
anyone know why it fails?
ref: machine learning - How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model? - Stack Overflow
ref: How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model?