Trainer do not log validation loss and metrics

Hello, today I use Trainer to train a Lora model, but there is no log for validation loss and metrics in the results of trainer.train(). The codes are as follows:

accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy.compute(predictions=predictions, references=labels)


tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base-v2")
dataset = load_from_disk("data")


def tokenize(batch):
    return tokenizer(batch["sentence"], truncation=True, max_length=150)


dataset = dataset.map(tokenize, batched=True).remove_columns(["sentence"])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dataset.set_format(
    "torch", device=device, columns=["input_ids", "attention_mask", "labels"]
)
### input_ids must be the first column
dataset = dataset.map(lambda batch: {"new_labels": batch["labels"]}, batched=True)
dataset = dataset.remove_columns("labels")
dataset = dataset.rename_column("new_labels", "labels")

from peft import LoraConfig, TaskType, get_peft_model
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

peft_config = LoraConfig(
    TaskType.SEQ_CLS, "vinai/phobert-base-v2", r=8, lora_alpha=8, lora_dropout=0.1
)
model = AutoModelForSequenceClassification.from_pretrained(
    "vinai/phobert-base-v2", num_labels=2
)
model = get_peft_model(model, peft_config)

args = TrainingArguments(
    output_dir="./checkpoints",
    overwrite_output_dir=True,
    evaluation_strategy="epoch",
    per_device_eval_batch_size=64,
    per_device_train_batch_size=64,
    gradient_accumulation_steps=4,
    optim="adamw_torch_fused",
    tf32=True,
    learning_rate=5e-5,
    weight_decay=0.01,
    num_train_epochs=10,
    logging_strategy="epoch",
    save_strategy="epoch",
    dataloader_num_workers=10,
    remove_unused_columns=False
)

args.set_dataloader(auto_find_batch_size=True)

trainer = Trainer(
    model=model,
    args=args,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    compute_metrics=compute_metrics,
)

trainer.train()

The output are:
image

Ive been dealing with this same issue, and it looks like you have to provide a name of a column that contains your labels, for example pass this as an argument to your trainer.

label_names = ["labels"],

1 Like

Thanks for this reply, I was facing a similar issue.
I haven’t tried it yet to see if it solves my solution as well.
If you don’t mind me asking a clarifying question, is the argument label_names to the trainer dependent on what model one uses or is it a generic argument regardless of the underlying model that is being used?

As beginner with hf library on one hand we are all grateful that it exists on the other hand IMHO it’s a mess, it didn’t copy the good practices from other libraries like scikit or pytorch.

e.g. from transformers import X where X can be anything under the sun, instead of having a more structured approach from transformers.models import X, from transformers.tokenizers import Y, from transformers.datasets import Z, etc.

Aso , the whole AutoModelXYZ is utterly confusing, it would have been much more clear if we only had from transformers.models import Model, ModelConfig and then in the ModelConfig one defines whatever task they are interested in.

1 Like

This didn’t work in my case :cry: :sob:

the keyword goes in TrainingArguments, not in Trainer

I confirm that this fixes the error, thank you so much, it took some time to found the solution :slight_smile: