Hi, I’m fine tuning LLMs like LLAMA v2 and alike in a quite a standard way using base model form HF then adding LORA matrices for finetuning (training size: ~1.5k cases). The problem I’m facing is that when doing inference, slight changes in the prompt like adding a dot, o removing some seemingly useless word sometimes completely changes model output. Maybe sb already faced it, any ideas how to make it more stable?
Training snippets:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained(
ft_parameters.MODEL_NAME,
quantization_config=bnb_config,
device_map=ft_parameters.DEVICE_MAP,
trust_remote_code=True,
use_cache=False,
)
# PREPARE MODEL TO TRAINING
model.config.pretraining_tp = 1
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
args = TrainingArguments(
output_dir=OUTPUT_MODEL_DIR,
num_train_epochs=5,
per_device_train_batch_size=2
per_device_eval_batch_size=1,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
optim="paged_adamw_32bit",
logging_steps=10,
save_strategy="epoch",
learning_rate=2e-4,
bf16=True,
tf32=True,
max_grad_norm=0.3,
warmup_ratio=0.03,
lr_scheduler_type="constant",
evaluation_strategy="epoch" if ft_parameters.DO_EVAL else "no",
report_to="none",
disable_tqdm=False, # disable tqdm since with packing values are in correct
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset_train,
eval_dataset=dataset_eval,
peft_config=peft_config,
max_seq_length=ft_parameters.MAX_SEQ_LENGTH,
tokenizer=tokenizer,
dataset_text_field="instruction",
args=args,
)