Fine tuning Mistal Model with custom PAD token

Hello,

I am pretty new to ML. I am trying to fine tune the Mistral-7B-Instruct-v0.2 on a home setup. As mentioned here I am setting a custom PAD token to avoid the fine tuned model to give very long responses.

I am following the guide on this medium post where I run the fine tuning, then save the fine tuned adaptor and tokenizer and then use llama.cpp to merge the original FP16 model with the fine tuned Lora apater. My process works fine if I do not add the PAD token but after adding it, llama.cpp fails to merge, giving the following error:

ggml/src/ggml.c:4760: GGML_ASSERT(ggml_can_repeat(b, a)) failed

My code is the following:

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('[PAD]')
tokenizer.padding_side = 'right'
tokenizer.model_max_length=256

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=bnb_config)
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
model.config.use_cache = False


config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.1,
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

training_args = SFTConfig(
    output_dir="./results/fine_tuned_model_adapter",
    overwrite_output_dir=True,
    num_train_epochs=1, 
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=1e-5,
    optim="adafactor",
    dataset_text_field='text',
    max_seq_length=1024,
    gradient_checkpointing=True,
    evaluation_strategy="steps",
    logging_strategy="steps",
    logging_steps=1,
    eval_steps=30,
    warmup_steps=1,
    gradient_accumulation_steps = 4
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    tokenizer=tokenizer,
)
trainer.train()

lora_model = trainer.model
lora_model.save_pretrained("./results/lora_adapter")
tokenizer = trainer.tokenizer
tokenizer.save_pretrained("./results/lora_adapter")

And then I am using llama.cpp

python3 llama.cpp/convert_lora_to_gguf.py --base /home/user/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/3ad372fc79158a2148299e3318516c786aeded6c/
 --outfile ./results/lora_adaptor.gguf --outtype f16 ./results/lora_adapter

llama.cpp/llama-export-lora -m Mistral.gguf  --lora ./results/lora_adaptor.gguf -o ./results/Mistral-merged.gguf

I am wondering if I am missing something. Any help is much appreciated.
Thanks!

1 Like

Perhaps the code is not theoretically wrong. There are three possible causes for now.

  1. Llamacpp misidentifies it as a new model or a new format that it is not supposed to be.
    crash on GGML_ASSERT: 'rwkv.cpp/ggml/src/ggml.c:5316: ggml_can_repeat_rows(b, a)' · Issue #138 · RWKV/rwkv.cpp · GitHub
  2. The model itself is potentially buggy in some way.
    Bug: [SYCL] GGML_ASSERT Error with Llama-3.1 SYCL Backend. Windows 11 OS · Issue #8660 · ggerganov/llama.cpp · GitHub
  3. The combination of PEFT and BNB in the current version may cause a bug in LoRA merging in a 4-bit quantized state. (practical experience)