BertForMaskedLM training from scratch not converging

gSulpizio · July 18, 2023, 2:50pm

Hello,

I am a researcher at ETHZ trying to use BertForMaskedLM for chemistry. This has already been done, and is called “molecular transformer”.

Unfortunately, the training does not seem to converge. Every epoch training only lasts for a few seconds, although the same training takes approx. 24h with the same hyperparameters using SimpleTransformers Here and Here.

What I am doing right now:

my eval and train datasets are already tokenized using the encoder_plus method of a slightly modified BERT Tokenizer:

>> eval_dataset.__getitem__(0)
{'input_ids': tensor([ 3, 25,  7, ...       0]), 'token_type_ids': tensor([0, 0, 0, 0, ...       0]), 'attention_mask': tensor([1, 1, 1, 1, ...       0])}

from transformers import (
    BertConfig,
    BertForMaskedLM,
    DataCollatorForLanguageModeling,
    TrainingArguments,
    Trainer
)
data_collator=DataCollatorForLanguageModeling(tokenizer=self.tokenizer, mlm=True, mlm_probability=self.mask_prob)



training_args = TrainingArguments(
            output_dir="output",
            evaluation_strategy="epoch",
            num_train_epochs=50,
            report_to="wandb",
            learning_rate=0.00005,
        )

bert_config = BertConfig(
            vocab_size=120, # This should be correct for our use
            num_attention_heads=4,
            hidden_size=256,
            intermediate_size=512,
        )

model = BertForMaskedLM(bert_config)

trainer = SmilesTrainer(
            model=model,
            args=training_args,
            data_collator=data_collator,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,)
trainer.train()

Resulting eval metrics:

Does anybody know why it does not converge?

Best,

G. Sulpizio.

Topic		Replies	Views
Getting the MLM accuracy for the BERT model I am training from scratch Beginners	7	5354	October 5, 2023
Fine-tuning BERT Model on domain specific language and for classification 🤗Transformers	7	8422	November 14, 2024
Extra Dimension with DataCollatorFor LanguageModeling into BertForMaskedLM? Beginners	7	2016	January 16, 2024
Couple of questions about Trainer Beginners	0	329	June 13, 2023
BertForMaskedLM training from scratch 🤗Transformers	0	1039	April 7, 2023

BertForMaskedLM training from scratch not converging

Related topics