Couple of questions about Trainer

farbodr · June 13, 2023, 6:03pm

I’m trying to train a BertForMaskLM model from scratch for a domain-specific vocab. I’ve trained a whole word tokenizer that seems to work well, but I see some issues with my trainer.

Here is the code for my trainer and my questions:

It doesn’t look like that compute_metric is getting called
The validation loss is always nan? Please explain why.
Shouldn’t be able to see the metric (in this case, accuracy) at each step checkpoint
When calling trainer.predict(test_dataset) the EvalPrediction items are all null.

metrics = evaluate.load('accuracy')

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.argmax(preds, axis=-1)
    result = metrics.compute(predictions=preds, references=p.label_ids)
    return result

batch_size = 32

config = BertConfig(vocab_size=len(tokenizer),
                    output_hidden_states=True)
bert_model = BertForMaskedLM(config)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

training_args = TrainingArguments(
    output_dir="./bert-checkpoints",
    overwrite_output_dir=True,
    num_train_epochs=2,
    auto_find_batch_size=True,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    save_steps=0,
    save_total_limit=2,
    prediction_loss_only=True,
    evaluation_strategy = 'steps',
    logging_steps=1000,
    eval_steps=1000,
    do_train=True,
    do_eval=True
)

trainer = Trainer(
    model=bert_model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

trainer.train()

Topic		Replies	Views
Getting the MLM accuracy for the BERT model I am training from scratch Beginners	7	5368	October 5, 2023
Pre-training a BERT model from scratch with custom tokenizer Intermediate	5	3106	January 11, 2022
I used a trainer to pretraining a BertForMaskedLM model, but the training loss always be zero 🤗Transformers	0	234	August 31, 2023
Trainer predict or evaluate returns zero for metrics 🤗Transformers	0	56	July 11, 2024
ModernBERT MaskedLM nan training loss Models	7	603	January 27, 2025

Couple of questions about Trainer

Related topics