How to do classification fine-tuning of quantized models?

Classification fine-tuning of a non-quantized model works as expected, from the following code (assume all imports and unknown symbols are pre-defined) :

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label=id2label, 
    label2id=label2id
)

lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, 
    r=1,
    lora_alpha=8,
    lora_dropout=0.5,
)

model = get_peft_model(model, lora_config)
model.config.pad_token_id = model.config.eos_token_id

training_args = TrainingArguments(
    output_dir="test_trainer", 
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    eval_steps=50,
    logging_steps=50,
    evaluation_strategy="steps",
    num_train_epochs=3,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['eval'],
    compute_metrics=compute_metrics,
    data_collator=data_collator,
)

trainer.train()

With losses being logged as:

Step 	Training Loss 	Validation Loss 	Accuracy
50 	       0.699500 	    1.295200        0.514000
100 	   0.673200 	    1.224600        0.513400
...

But, when loading the model in 4bit with the following change (subsequent lines remaining the same) :

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    quantization_config=bnb_config,
    num_labels=2,
    id2label=id2label, 
    label2id=label2id
)
... 

The training log is not as expected:

Step 	Training Loss 	Validation Loss 	Accuracy
50 	       3.699500 	    nan             1.000000
100 	   0.000000 	    nan 	        1.000000
...

And the model does not perform well after training.

Is there anything that I am missing or implementing incorrectly?