Trainer in PEFT doesn't report evaluation metrics

Hi,

I am fine-tuning the Llama3-8B model for sequence tasks using QLora. Whenever I use QLora, the evaluation metrics never gets printed (see the results at the bottom). However, when I use the full 32-bit precision model without having any PEFT, the metrics get printed. I am not sure if this is a bug, so I am asking it here. Also, I observed the same output when using other models such as BERT. So it’s not limited to Llama models only. If you need any other details, please let me know. Thanks!

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.half,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

# quantization_config=BitsAndBytesConfig(
#     quant_type="dynamic",  # Use dynamic quantization
#     bits=4  # Specify 4-bit quantization
# )

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Ensure the tokenizer has a padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForSequenceClassification.from_pretrained(checkpoint,
                                                           #torch_dtype=torch.half,
                                                           quantization_config=quantization_config,
                                                           )

Below is my compute_metrics function:

def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    acc = accuracy_score(p.label_ids, preds)
    return {
        'accuracy': acc,
        'precision': precision,
        'recall': recall,
        'f1': f1,
    }

This is my training arguments and trainer:

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    num_train_epochs=2,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    warmup_steps=500,
    weight_decay=0.01,
    #logging_dir='./logs',
    evaluation_strategy="steps",
    eval_steps=500,
    logging_steps=50
)


# Initialize the Trainer
trainer = Trainer(
    model = peft_model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

# Train the model
trainer.train()

Outputs:

{'eval_runtime': 1229.9596, 'eval_samples_per_second': 20.326, 'eval_steps_per_second': 2.541, 'epoch': 1.92}
{'loss': 0.2013, 'grad_norm': 90.90315246582031, 'learning_rate': 6.956521739130435e-07, 'epoch': 1.94}
{'loss': 0.124, 'grad_norm': 1.2543754577636719, 'learning_rate': 5.217391304347826e-07, 'epoch': 1.95}
{'loss': 0.1548, 'grad_norm': 0.5100873708724976, 'learning_rate': 3.4782608695652175e-07, 'epoch': 1.97}
{'loss': 0.2046, 'grad_norm': 0.08409222960472107, 'learning_rate': 1.7391304347826088e-07, 'epoch': 1.98}
{'loss': 0.2284, 'grad_norm': 0.019938422366976738, 'learning_rate': 0.0, 'epoch': 2.0}
{'train_runtime': 20509.1509, 'train_samples_per_second': 2.438, 'train_steps_per_second': 0.305, 'train_loss': 0.24805847625732422, 'epoch': 2.0}
Validation Results: {'eval_runtime': 1226.4144, 'eval_samples_per_second': 20.385, 'eval_steps_per_second': 2.548, 'epoch': 2.0}
1 Like

Hi, I am running into this same problem using TinyBERT + LoRA. Were you ever able to figure out how to fix this? Thanks!

1 Like

I haven’t done any in-depth research on this, but I’ve seen more than a few people on this forum, Discord, and github who are having the same problem…:thinking:
It seems that some people have fixed it by upgrading PEFT, but there may be some kind of bug, including with other libraries.

pip install -U peft bitsandbytes