Bert trainning loss suddenly explodes

hong-555 · July 26, 2021, 8:11am

Training Chinese and English language modeling MLM bert model
–num_train_epochs 500
–per_device_train_batch_size 8
–learning_rate 1e-4
–warmup_steps 5000
–max_seq_length 512 \

adnenabdessaied · July 25, 2023, 11:07am

Hey! This is probably too late. But this is most likely due to a high learning rate. You could reduce the learning rate and/or use gradient clipping. The latter will not have a significant change in performance, but will prevent a bad mini-batch from messing up the training of your model. Good luck!

Topic		Replies	Views
Training Loss Sudden Spike After 8 Hours of pre-training a BERT Model 🤗Transformers	0	1122	September 13, 2023
Bert LM pretraining: training loss goes to 0 at masking probability of 0.999 Beginners	2	2314	October 31, 2020
Training speed becoming much slower when using a larger dataset Beginners	0	317	March 31, 2022
Loss behaviour for bert fine-tuning on QNLI Models	3	4414	October 15, 2021
Very poor model performance post-training 🤗Transformers	0	400	November 1, 2021

Bert trainning loss suddenly explodes

Related topics