Config.json is not saving after finetuning Llama 2

If you train a model with LoRa (low-rank adaptation), you only train adapters on top of the base model. E.g. if you fine-tune LLaMa with LoRa, you only add a couple of linear layers (so-called adapters) on top of the original (also called base) model. Hence calling save_pretrained() or push_to_hub() will only save 2 things:

  • the adapter configuration (in an adapter_config.json file)
  • the adapter weights (typically in a safetensors file).

See here for example: ybelkada/opt-350m-lora at main. Here, OPT-350m is the base model.

In order to merge these adapter layers back into the base model, one can call the merge_and_unload method. Afterwards, you can call save_pretrained() on it which will save both the weights and the configuration in a config.json file:

from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)

model = model.merge_and_unload()
model.save_pretrained("my_model")

One feature of the Transformers library is that it has PEFT integration, which means that you can call from_pretrained() directly on a folder/repository that only contains this adapter_config.json file and the adapter weights, and it will automatically load the weights of the base model + adapters. See PEFT integrations. Hence we could also just have done this:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(folder_containing_only_adapter_weights)
model.save_pretrained("my_model")
7 Likes