Hello,
I am pretty new to ML. I am trying to fine tune the Mistral-7B-Instruct-v0.2 on a home setup. As mentioned here I am setting a custom PAD token to avoid the fine tuned model to give very long responses.
I am following the guide on this medium post where I run the fine tuning, then save the fine tuned adaptor and tokenizer and then use llama.cpp to merge the original FP16 model with the fine tuned Lora apater. My process works fine if I do not add the PAD token but after adding it, llama.cpp fails to merge, giving the following error:
ggml/src/ggml.c:4760: GGML_ASSERT(ggml_can_repeat(b, a)) failed
My code is the following:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('[PAD]')
tokenizer.padding_side = 'right'
tokenizer.model_max_length=256
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=bnb_config)
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
model.config.use_cache = False
config = LoraConfig(
r=32,
lora_alpha=64,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
],
bias="none",
lora_dropout=0.1,
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
training_args = SFTConfig(
output_dir="./results/fine_tuned_model_adapter",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=1e-5,
optim="adafactor",
dataset_text_field='text',
max_seq_length=1024,
gradient_checkpointing=True,
evaluation_strategy="steps",
logging_strategy="steps",
logging_steps=1,
eval_steps=30,
warmup_steps=1,
gradient_accumulation_steps = 4
)
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
tokenizer=tokenizer,
)
trainer.train()
lora_model = trainer.model
lora_model.save_pretrained("./results/lora_adapter")
tokenizer = trainer.tokenizer
tokenizer.save_pretrained("./results/lora_adapter")
And then I am using llama.cpp
python3 llama.cpp/convert_lora_to_gguf.py --base /home/user/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/3ad372fc79158a2148299e3318516c786aeded6c/
--outfile ./results/lora_adaptor.gguf --outtype f16 ./results/lora_adapter
llama.cpp/llama-export-lora -m Mistral.gguf --lora ./results/lora_adaptor.gguf -o ./results/Mistral-merged.gguf
I am wondering if I am missing something. Any help is much appreciated.
Thanks!