Merged LoRA & text generation inference issues

Pitchboy · July 6, 2023, 12:57pm

Hi,

I have finetuned falcon-7b for a specific task using Peft library and espacially LoRA adpater. My finetuning is working well and I wished that I can use it with text-generation-inference (here).
falcon mode is supported, but Peft not. So I merged my LoRA weights to the base model. I can use this merged model with transformers AutoModel but in text-generation-inference I can not. Here error’s messages:

Torch: RuntimeError: weight transformer.word_embeddings.weight does not exist
Safetensors: RuntimeError: weight lm_head.weight does not exist and indeed in config there is no lm_head filed.

Any clues on what should I do ?

0sunfire0 · July 7, 2023, 1:12pm

Could you show the finetuning code, its hard to see where this error happen, it looks to me like you finetuned the model with wrong attention blocks, but maybe i’m wrong.

Pitchboy · July 7, 2023, 4:39pm

Of course here it is.

def train(self, training_text: str, lora_name: str, **kwargs):
        assert self.model is not None
        assert self.tokenizer is not None

        kwargs = {**TRAINING_PARAMS, **LORA_TRAINING_PARAMS, **kwargs}
        train_dataset = self.tokenize_dataset(
            training_text, kwargs["dataset_max_size"], kwargs["max_sequence_length"]
        )

        args = {}

        if "tiiuae/falcon" in self.model_name:
            args = {
                "target_modules": [
                    "query_key_value",
                    "dense",
                    "dense_h_to_4h",
                    "dense_4h_to_h",
                ]
            }
        if kwargs["is_gpt2"] == True:
            args["fan_in_fan_out"] = True

        self.model = peft.prepare_model_for_kbit_training(self.model)
        self.model = peft.get_peft_model(
            self.model,
            peft.LoraConfig(
                r=kwargs["lora_r"],
                lora_alpha=kwargs["lora_alpha"],
                lora_dropout=kwargs["lora_dropout"],
                bias="none",
                task_type="CAUSAL_LM",
                **args,
            ),
        )

        if not os.path.exists(LORA_DIR):
            os.makedirs(LORA_DIR)

        sanitized_model_name = sanitize_model_name(self.model_name)
        output_dir = f"{LORA_DIR}/{sanitized_model_name}_{lora_name}"

        training_args = TrainingArguments(
            per_device_train_batch_size=kwargs["micro_batch_size"],
            gradient_accumulation_steps=kwargs["gradient_accumulation_steps"],
            num_train_epochs=kwargs["epochs"],
            learning_rate=kwargs["learning_rate"],
            warmup_steps=math.floor(len(train_dataset) * 0.05),
            fp16=True,
            optim="adamw_torch",
            logging_steps=5,
            save_total_limit=3,
            save_steps=0,
            output_dir=output_dir,
        )

        self.trainer = Trainer(
            model=self.model,
            train_dataset=train_dataset,
            args=training_args,
            data_collator=DataCollatorForLanguageModeling(
                self.tokenizer,
                mlm=False,
            )
        )

        self.model.config.use_cache = False

        self.trainer.train(resume_from_checkpoint=False)

        self.model.save_pretrained(output_dir)

0sunfire0 · July 7, 2023, 5:36pm

Honestly im not really sure but i think you are replacing some layers with another by using ```
[“fan_in_fan_out”] = True, and this could be the reason

philglau · July 26, 2023, 5:15pm

for tiiuae/falcon-7b I’ve had luck fine-tuning and performing inference with PEFT.

The main difference in my code is that I’m only targeting: “query_key_value”
Maybe give that a shot. I just read the original Lora paper last night and they findings were that targeting just “query_value” is likely sufficient. I recommend giving it a read, it’s pretty quick and was very informative.

One of the things I learned (that was hard to find definitive answer for elsewhere) was the implications of lora_alpha on training. In the article they indicate that they keep it at a 1-to-1 ratio as it’s equivalent to scaling the learning rate. Thus if r=16, they set lora_alpha = 16, r=8 they set lora_alpha = 8, etc.

Agniva · November 20, 2023, 2:56pm

You are missing this line when using LoRA:
"modules_to_save": ["embed_tokens", "lm_head"], # without these, model saved wont have newly resized embeddings
You want peft to save these as well (embed_tokens in my case because I was adding some special tokens).

Topic		Replies	Views
Further finetuning a LoRA finetuned CausalLM Model 🤗Transformers	17	10808	July 7, 2024
Handling Peft Model the right way (save, load, inference) 🤗Transformers	0	130	August 10, 2024
Combine between lora and prompt tunning 🤗Transformers	1	895	February 3, 2024
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model? 🤗Transformers	27	1005	February 7, 2025
After llama fine tuning, model merging fails Beginners	1	39	May 20, 2025

Merged LoRA & text generation inference issues

Related topics