Fine tuning Mistal Model with custom PAD token

platons · October 1, 2024, 5:26am

Hello,

I am pretty new to ML. I am trying to fine tune the Mistral-7B-Instruct-v0.2 on a home setup. As mentioned here I am setting a custom PAD token to avoid the fine tuned model to give very long responses.

I am following the guide on this medium post where I run the fine tuning, then save the fine tuned adaptor and tokenizer and then use llama.cpp to merge the original FP16 model with the fine tuned Lora apater. My process works fine if I do not add the PAD token but after adding it, llama.cpp fails to merge, giving the following error:

ggml/src/ggml.c:4760: GGML_ASSERT(ggml_can_repeat(b, a)) failed

My code is the following:

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('[PAD]')
tokenizer.padding_side = 'right'
tokenizer.model_max_length=256

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=bnb_config)
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
model.config.use_cache = False


config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.1,
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

training_args = SFTConfig(
    output_dir="./results/fine_tuned_model_adapter",
    overwrite_output_dir=True,
    num_train_epochs=1, 
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=1e-5,
    optim="adafactor",
    dataset_text_field='text',
    max_seq_length=1024,
    gradient_checkpointing=True,
    evaluation_strategy="steps",
    logging_strategy="steps",
    logging_steps=1,
    eval_steps=30,
    warmup_steps=1,
    gradient_accumulation_steps = 4
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    tokenizer=tokenizer,
)
trainer.train()

lora_model = trainer.model
lora_model.save_pretrained("./results/lora_adapter")
tokenizer = trainer.tokenizer
tokenizer.save_pretrained("./results/lora_adapter")

And then I am using llama.cpp

python3 llama.cpp/convert_lora_to_gguf.py --base /home/user/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/3ad372fc79158a2148299e3318516c786aeded6c/
 --outfile ./results/lora_adaptor.gguf --outtype f16 ./results/lora_adapter

llama.cpp/llama-export-lora -m Mistral.gguf  --lora ./results/lora_adaptor.gguf -o ./results/Mistral-merged.gguf

I am wondering if I am missing something. Any help is much appreciated.
Thanks!

John6666 · October 1, 2024, 6:30am

Perhaps the code is not theoretically wrong. There are three possible causes for now.

Llamacpp misidentifies it as a new model or a new format that it is not supposed to be.
crash on GGML_ASSERT: 'rwkv.cpp/ggml/src/ggml.c:5316: ggml_can_repeat_rows(b, a)' · Issue #138 · RWKV/rwkv.cpp · GitHub
The model itself is potentially buggy in some way.
Bug: [SYCL] GGML_ASSERT Error with Llama-3.1 SYCL Backend. Windows 11 OS · Issue #8660 · ggerganov/llama.cpp · GitHub
The combination of PEFT and BNB in the current version may cause a bug in LoRA merging in a 4-bit quantized state. (practical experience)

Topic		Replies	Views
Can't set pad_token by adding special token to Llama's tokenizer 🤗Transformers	4	5855	August 12, 2024
Mistral trouble when fine-tuning : Don't set pad_token_id = eos_token_id 🤗Transformers	8	5728	August 28, 2024
Finetuning LLM(e.g Mistral-7B) on multiple CPUs with (Q)LoRa Beginners	0	898	February 21, 2024
QLoRA Llama2 additional special tokens Beginners	2	2966	November 23, 2023
Error when fine-tuning bofenghuang/vigogne-instruct-7b Models	0	338	June 23, 2023

Fine tuning Mistal Model with custom PAD token

Related topics