I have finetuned falcon-7b for a specific task using Peft library and espacially LoRA adpater. My finetuning is working well and I wished that I can use it with text-generation-inference (here). falcon mode is supported, but Peft not. So I merged my LoRA weights to the base model. I can use this merged model with transformers AutoModel but in text-generation-inference I can not. Here error’s messages:
Torch: RuntimeError: weight transformer.word_embeddings.weight does not exist
Safetensors: RuntimeError: weight lm_head.weight does not exist and indeed in config there is no lm_head filed.
Could you show the finetuning code, its hard to see where this error happen, it looks to me like you finetuned the model with wrong attention blocks, but maybe i’m wrong.
Honestly im not really sure but i think you are replacing some layers with another by using ```
[“fan_in_fan_out”] = True, and this could be the reason
for tiiuae/falcon-7b I’ve had luck fine-tuning and performing inference with PEFT.
The main difference in my code is that I’m only targeting: “query_key_value”
Maybe give that a shot. I just read the original Lora paper last night and they findings were that targeting just “query_value” is likely sufficient. I recommend giving it a read, it’s pretty quick and was very informative.
One of the things I learned (that was hard to find definitive answer for elsewhere) was the implications of lora_alpha on training. In the article they indicate that they keep it at a 1-to-1 ratio as it’s equivalent to scaling the learning rate. Thus if r=16, they set lora_alpha = 16, r=8 they set lora_alpha = 8, etc.
You are missing this line when using LoRA: "modules_to_save": ["embed_tokens", "lm_head"], # without these, model saved wont have newly resized embeddings
You want peft to save these as well (embed_tokens in my case because I was adding some special tokens).