Classification fine-tuning of a non-quantized model works as expected, from the following code (assume all imports and unknown symbols are pre-defined) :
model = AutoModelForSequenceClassification.from_pretrained(
"gpt2",
num_labels=2,
id2label=id2label,
label2id=label2id
)
lora_config = LoraConfig(
task_type=TaskType.SEQ_CLS,
r=1,
lora_alpha=8,
lora_dropout=0.5,
)
model = get_peft_model(model, lora_config)
model.config.pad_token_id = model.config.eos_token_id
training_args = TrainingArguments(
output_dir="test_trainer",
learning_rate=2e-5,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
eval_steps=50,
logging_steps=50,
evaluation_strategy="steps",
num_train_epochs=3,
weight_decay=0.01,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset['train'],
eval_dataset=tokenized_dataset['eval'],
compute_metrics=compute_metrics,
data_collator=data_collator,
)
trainer.train()
With losses being logged as:
Step Training Loss Validation Loss Accuracy
50 0.699500 1.295200 0.514000
100 0.673200 1.224600 0.513400
...
But, when loading the model in 4bit with the following change (subsequent lines remaining the same) :
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForSequenceClassification.from_pretrained(
"gpt2",
quantization_config=bnb_config,
num_labels=2,
id2label=id2label,
label2id=label2id
)
...
The training log is not as expected:
Step Training Loss Validation Loss Accuracy
50 3.699500 nan 1.000000
100 0.000000 nan 1.000000
...
And the model does not perform well after training.
Is there anything that I am missing or implementing incorrectly?