QLoRA with GPTQ

lewisbails · October 10, 2023, 9:35am

I’m having problems fine-tuning with pre-quantised models. My training loss is sometimes 0 and the validation loss is nan, so I assume this is an overflow issue?
Does anyone see anything obviously wrong with the way I am training my model?

config = GPTQConfig(bits=4, disable_exllama=True)
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ", quantization_config=config, device_map="auto", torch_dtype="auto")
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")

peft_config = LoraConfig(task_type="CAUSAL_LM", r=64, lora_alpha=16)

...

training_args = TrainingArguments(fp16=True, optim="paged_adamw_32bit", ...)

trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args,
        peft_config=peft_config,
        ...
    )

trainer.train()

#####
                                                                                                             ...  ...                                                                                                             

{'loss': 0.0, 'learning_rate': 0.0004953917050691245, 'epoch': 0.27}                                                                                                                  
{'loss': 0.0, 'learning_rate': 0.0004953917050691245, 'epoch': 0.28}                                                                                                                  
{'loss': 2.0689, 'learning_rate': 0.0004953917050691245, 'epoch': 0.28}                                                                                                                  
{'eval_loss': nan, 'eval_runtime': 149.173, 'eval_samples_per_second': 0.597, 'eval_steps_per_second': 0.302, 'epoch': 0.28}

#####

lewisbails · October 10, 2023, 11:43am

UserWarning: You passed a tokenizer with padding_sidenot equal to rightto the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding tokenizer.padding_side = ‘right’ to your code.

It is working now after adding padding_side=right to the tokenizer. Why does the padding side affect overflow in half-precision training?

ggabry · January 31, 2024, 3:06pm

I’d also like to know why.

Here there is explicitly written to use padding_side='left' Generation with LLMs

pranil51 · September 22, 2024, 7:09am

Absolutely

Topic		Replies	Views
Fine-tuning LLM for regression yields low loss during training but not in inference? 🤗Transformers	2	4491	March 4, 2024
Fine tune and then successfully AWQ quantize Beginners	3	2914	February 16, 2024
How to do classification fine-tuning of quantized models? 🤗Transformers	0	481	February 2, 2024
Reduced inference f1 score with QLoRA finetuned model Intermediate	1	881	September 6, 2023
Llama2 fine-tunning with PEFT QLora and testing the model 🤗Transformers	13	15235	December 21, 2023

QLoRA with GPTQ

Related topics