I’m having problems fine-tuning with pre-quantised models. My training loss is sometimes 0 and the validation loss is nan, so I assume this is an overflow issue?
Does anyone see anything obviously wrong with the way I am training my model?
config = GPTQConfig(bits=4, disable_exllama=True)
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ", quantization_config=config, device_map="auto", torch_dtype="auto")
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
peft_config = LoraConfig(task_type="CAUSAL_LM", r=64, lora_alpha=16)
...
training_args = TrainingArguments(fp16=True, optim="paged_adamw_32bit", ...)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
args=training_args,
peft_config=peft_config,
...
)
trainer.train()
#####
... ...
{'loss': 0.0, 'learning_rate': 0.0004953917050691245, 'epoch': 0.27}
{'loss': 0.0, 'learning_rate': 0.0004953917050691245, 'epoch': 0.28}
{'loss': 2.0689, 'learning_rate': 0.0004953917050691245, 'epoch': 0.28}
{'eval_loss': nan, 'eval_runtime': 149.173, 'eval_samples_per_second': 0.597, 'eval_steps_per_second': 0.302, 'epoch': 0.28}
#####