How to increase the width of hidden linear layers in Mistral 7B model?

alvations · March 5, 2024, 12:42pm

After installing

!pip install -U bitsandbytes
!pip install -U transformers
!pip install -U peft
!pip install -U accelerate
!pip install -U trl

And then some boilerplates to load the Mistral model:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from datasets import load_dataset
from trl import SFTTrainer

import torch

bnb_config = BitsAndBytesConfig(  
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)

base_model="mistralai/Mistral-7B-v0.1"


model = AutoModelForCausalLM.from_pretrained(
        base_model,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

We can see the layers/architecture:

>>> model

[out]:

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

Is there any way to increase the width size of the Linear4bit layers?

E.g. if we want the model to take in another 800 more hidden nodes layer to get

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4896)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4896, out_features=4896, bias=False)
          (k_proj): Linear4bit(in_features=4896, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4896, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4896, out_features=4896, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4896, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4896, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4896, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRMSNorm()
  )
  (lm_head): Linear(in_features=4896, out_features=32000, bias=False)
)

It’s okay if the additional hidden nodes in the Linear4bit be randomly initialized.

alvations · March 5, 2024, 12:43pm

Also asked on python - How to increase the width of hidden linear layers in Mistral 7B model? - Stack Overflow

Topic		Replies	Views
Fine-tuning `mistral-7B` for classification with QLoRA using peft Intermediate	2	449	June 13, 2024
Size mismatch error in PEFT fine tuned model 🤗Transformers	4	1445	July 2, 2024
Mistral 7B RAG Langchaing Models	0	2619	February 20, 2024
Unable to deploy fine tuned Mistral Amazon SageMaker	0	268	May 6, 2024
How to load a model fine-tuned with QLoRA 🤗Transformers	2	6557	July 29, 2024

How to increase the width of hidden linear layers in Mistral 7B model?

Is there any way to increase the width size of the Linear4bit layers?

Related topics