How to get a LLaMA v2 model with less than 7B parameters?

I want to pre-train a Decoder (Causal Model) model with less than 7B (since 7B and above are unstable during training, I want to guarantee to the best of my abilities that the pre-training will go smoothly with minimum baby sitting).

Given how nice the pre-training curves for LLaMA v2 (llama2) are I will try that.

What I need is:

  1. be able to initialize a llama 2 architecture with less parameters (e.g., decreasing the width or decreasing the layers)
  2. then randomly initialize it.

How do I do the above?

Some initial code:

    from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
    torch_dtype = torch.bfloat16
    torch_dtype = torch.float32
    pretrained_model_name_or_path = 'meta-llama/Llama-2-7b-hf'
    bf16=torch.cuda.get_device_capability(torch.cuda.current_device())[0] >= 8,  # if >= 8 ==> brain float 16 available or set to True if you always want fp32
    model = AutoModelForCausalLM.from_pretrained(
        pretrained_model_name_or_path,
        # quantization_config=quantization_config,
        # device_map=device_map,  # device_map = None  https://github.com/huggingface/trl/blob/01c4a35928f41ba25b1d0032a085519b8065c843/examples/scripts/sft_trainer.py#L82
        trust_remote_code=True,
        torch_dtype=torch_dtype,
        use_auth_token=True,
    )
    print(f'{pretrained_model_name_or_path=}')
    tokenizer = AutoTokenizer.from_pretrained(model, torch_dtype=torch_dtype, use_auth_token=True)

and

from llama.models.llama_config import LLaMAConfig

class SmallLlamaConfig(LLaMAConfig):
    hidden_size = 2048
    num_layers = 24

config = SmallLlamaConfig()

model = AutoModelForCausalLM.from_config(config,
                                        random_init=True,
                                        load_in=['conv1', 'layer_norm']) 

print(model)

Maybe this?

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch

# Load the configuration of the LLaMA v2 model
config = AutoConfig.from_pretrained('meta-llama/Llama-2-7b-hf')

# Modify the configuration to reduce the model size
config.n_embd = 768  # Decrease for a smaller width
config.n_layer = 8   # Decrease for fewer layers
config.n_head = 12   # Adjust the number of attention heads if needed

# Initialize a model with the modified configuration
model = AutoModelForCausalLM(config=config)

# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')

# If you want to use a specific dtype
torch_dtype = torch.float32
model.to(dtype=torch_dtype)


note: a different model with the above conditions could work, but llama 2 is the ideal answer I think.

ref so: huggingface transformers - How to get a LLaMA v2 model with less than 7B parameters? - Stack Overflow
ref hf: How to get a LLaMA v2 model with less than 7B parameters?
ref dis: Discord