Bert trainning loss suddenly explodes

Training Chinese and English language modeling MLM bert model
–num_train_epochs 500
–per_device_train_batch_size 8
–learning_rate 1e-4
–warmup_steps 5000
–max_seq_length 512 \