Making fine-tuned LLM model more stable

Hi, I’m fine tuning LLMs like LLAMA v2 and alike in a quite a standard way using base model form HF then adding LORA matrices for finetuning (training size: ~1.5k cases). The problem I’m facing is that when doing inference, slight changes in the prompt like adding a dot, o removing some seemingly useless word sometimes completely changes model output. Maybe sb already faced it, any ideas how to make it more stable?

Training snippets:

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    )
    model = AutoModelForCausalLM.from_pretrained(
        ft_parameters.MODEL_NAME,
        quantization_config=bnb_config,
        device_map=ft_parameters.DEVICE_MAP,
        trust_remote_code=True,
        use_cache=False,
    )

    # PREPARE MODEL TO TRAINING
    model.config.pretraining_tp = 1
    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = prepare_model_for_kbit_training(model)
    model = get_peft_model(model, peft_config)

   args = TrainingArguments(
        output_dir=OUTPUT_MODEL_DIR,
        num_train_epochs=5,
        per_device_train_batch_size=2
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=2,
        gradient_checkpointing=True,
        optim="paged_adamw_32bit",
        logging_steps=10,
        save_strategy="epoch",
        learning_rate=2e-4,
        bf16=True,
        tf32=True,
        max_grad_norm=0.3,
        warmup_ratio=0.03,
        lr_scheduler_type="constant",
        evaluation_strategy="epoch" if ft_parameters.DO_EVAL else "no",
        report_to="none",
        disable_tqdm=False,  # disable tqdm since with packing values are in correct
    )
    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset_train,
        eval_dataset=dataset_eval,
        peft_config=peft_config,
        max_seq_length=ft_parameters.MAX_SEQ_LENGTH,
        tokenizer=tokenizer,
        dataset_text_field="instruction",
        args=args,
    )

One way to make the model more robust is including those changes in your training data (as a way of data augmentation).

Thanks, there are actually A LOT of possible transformations but I will give it a try. For example changing the position in the text of some sentences, upper/lowercase, for some fragments, etc.

@nielsr thanks, even simple augmentation with shuffling of sentences helped a bit. It is a good direction, probably more sophisticated augmentation could help even more.