Config.json is not saving after finetuning Llama 2

After finetuning, i’m not able to save a config.json file using trainer.model.save_pretrained

My pip install:

!pip install torch datasets
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

Model code:

model_name='meta-llama/Llama-2-7b-chat-hf'

model_config = transformers.AutoConfig.from_pretrained(
    model_name,
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=getattr(torch,"float16"),
    bnb_4bit_use_double_quant=False,
)

model = AutoModelForCausalLM.from_pretrained(
   model_name,
    quantization_config=bnb_config,
)

huggingface_dataset_name = "mlabonne/guanaco-llama2-1k"

#dataset = load_dataset(huggingface_dataset_name, "pqa_labeled", split = "train")

dataset = load_dataset(huggingface_dataset_name, split="train")

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

training_arguments = TrainingArguments(
    output_dir="random_weights",
    fp16=True,
    learning_rate=1e-5,
    num_train_epochs=5,
    weight_decay=0.01,
    logging_steps=1,
    max_steps=1)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_arguments
)

# Train model
trainer.train()

trainer.model.save_pretrained("finetuned_llama")

Now, when I go to load the model it complaing about not having a config.json. I’ve also tried explicitly saving the config using trainer.model.config.save_pretrained but I had no luck.

I might be missing something obvious. Any ideas on what might be going wrong?

2 Likes

Did you fix that?

I think it is because you are using LoRA which trains an adapter model. Try calling model.merge_and_unload() to merge the adapter model with your base model before saving.

Same problem with me also.

Can you please elaborate?

If you train a model with LoRa (low-rank adaptation), you only train adapters on top of the base model. E.g. if you fine-tune LLaMa with LoRa, you only add a couple of linear layers (so-called adapters) on top of the original (also called base) model. Hence calling save_pretrained() or push_to_hub() will only save 2 things:

  • the adapter configuration (in an adapter_config.json file)
  • the adapter weights (typically in a safetensors file).

See here for example: ybelkada/opt-350m-lora at main. Here, OPT-350m is the base model.

In order to merge these adapter layers back into the base model, one can call the merge_and_unload method. Afterwards, you can call save_pretrained() on it which will save both the weights and the configuration in a config.json file:

from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)

model = model.merge_and_unload()
model.save_pretrained("my_model")

One feature of the Transformers library is that it has PEFT integration, which means that you can call from_pretrained() directly on a folder/repository that only contains this adapter_config.json file and the adapter weights, and it will automatically load the weights of the base model + adapters. See PEFT integrations. Hence we could also just have done this:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(folder_containing_only_adapter_weights)
model = model.merge_and_unload()
model.save_pretrained("my_model")
2 Likes

I got the following error while running the upper code,

PS C:\Users\Anonymous\Desktop\AIRL Lab\MedLLM> python -u β€œc:\Users\Anonymous\Desktop\AIRL Lab\MedLLM\HF Connection (Temporary Use)\merging.py”
config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 699/699 [00:00<?, ?B/s]
model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.20G/2.20G [14:18<00:00, 2.56MB/s]
generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 124/124 [00:00<?, ?B/s]
Traceback (most recent call last):
File β€œc:\Users\Anonymous\Desktop\AIRL Lab\MedLLM\HF Connection (Temporary Use)\merging.py”, line 8, in
model = PeftModel.from_pretrained(model, adapter_model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File β€œC:\Users\Anonymous\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\peft\peft_model.py”, line 356, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File β€œC:\Users\Anonymous\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\peft\peft_model.py”, line 727, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File β€œC:\Users\Anonymous\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\peft\utils\save_and_load.py”, line 326, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File β€œC:\Users\Anonymous\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\safetensors\torch.py”, line 311, in load_file
with safe_open(filename, framework=β€œpt”, device=device) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
PS C:\Users\Anonymous\Desktop\AIRL Lab\MedLLM>