Hi
I am training a BERTforMaskedLM model from scratch.
This is my tokenizer (previously trained)
tokenizer = BertTokenizer('vocab.txt')
This is my config:
config = BertConfig(
vocab_size=20000,
max_position_embeddings=258
)
This is how I load the model from the last checkpoint:
model = BertForMaskedLM.from_pretrained("/BERT/bert-checkpoints/checkpoint-1726500",config=config)
My data collator:
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)
The compute metrics function:
from datasets import load_metric
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return {metric.compute(predictions=predictions, references=labels)}
The training arguments:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./bert-checkpoints",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
save_steps=500,
save_total_limit=2,
prediction_loss_only=True,
evaluation_strategy = 'steps'
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
eval_dataset=dataset,
compute_metrics=compute_metrics
)
And finally, I use this to train the model (which works fine):
trainer.train()
However, when the model is being trained, I only see 3 metrics: Step, Training Loss, and Validation Loss. I also want to see the “Accuracy of the Masked Language Model” (MLM accuracy). How should I do that? Note that I have already defined the “compute_metrics” function which has the “accuracy”. I do not know what is wrong. But the accuracy is not being shown.
Note: by the way, my dataset is an instance of the from torch.utils.data import Dataset
object which has a member called “examples”. For instance, dataset.examples[0]
is [2, 507, 157, 3656, 117, 2100, 521, 122, 280, 3]