Training loss is not decreasing using TFBertModel

zahraabbasian · September 13, 2021, 4:04pm

I have used the TFBertModel and AutoModel from the transformer library for training a two-class classification task and the training loss is not decreasing.

bert = TFBertModel.from_pretrained('bert-base-uncased')
input_ids = tf.keras.layers.Input(shape=(SEQ_LEN,), name='input_ids', dtype='int32')
mask = tf.keras.layers.Input(shape=(SEQ_LEN,), name='attention_mask', dtype='int32')
embeddings = bert(input_ids, attention_mask=mask)[1]
X = tf.keras.layers.Dropout(0.1)(embeddings)
X = tf.keras.layers.Dense(128, activation='relu')(X)
y = tf.keras.layers.Dense(1, activation='sigmoid', name='outputs')(X)
bert_model = tf.keras.Model(inputs=[input_ids, mask], outputs=y)
bert_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

But when I use the TFBertForSequenceClassification model the model converges fast and the training loss reaches zero.

bert_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased',num_labels=2)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5,epsilon=1e-08)
bert_model.compile(loss=loss, optimizer=optimizer, metrics=[metric])

I want to use the sequence output of BERT and hence I need to load the model with TFBertModel or something similar which returns the outputs of BERT.

@Rocketknight1

Rocketknight1 · September 15, 2021, 8:39am

Your code in the first block looks like it would work. I suspect the problem is one of two things:

The dropout may prevent the model from fully converging.
The default learning rate for Adam is 1e-3, which is much too high for training Transformer models. Try learning rates in the range 1e-5 to 1e-4. If training loss is still not decreasing even with a lower learning rate and no dropout then let me know and I’ll investigate further.

zahraabbasian · September 15, 2021, 5:30pm

Thank you for your reply. I eliminated the dropout layer and changed the learning_rate value to smaller value and it worked!

ahmedbr · May 24, 2023, 12:50pm

I’m trying to fine-tune using “google/mobilebert-uncased” but it seems that the model is not learning and the loss is too high. When I fine-tuned bert uncased using the same dataset and hyperparameters, the model converged and got acc of 92. I’m using the default lr=2e-5 and tried other values too. Any suggestions?

noah822 · October 24, 2023, 2:26pm

Any follow up here? I recently tried to fine tune “google/mobilebert-uncased” model. I observed similar not learning behavior as well. What might cause the problem?

Topic		Replies	Views
Having issues finetuning a Bert model pretrained from scratch on downstream task (GLUE Dataset)! Intermediate	0	714	March 26, 2022
Loss rise and acc decline Models	0	320	November 25, 2020
Replace weights in TFBertModel 🤗Transformers	1	2074	December 4, 2021
BERT for NextSentencePrediction train and inference problem, thanks 🤗Transformers	0	635	February 25, 2022
I meet the zero gradient descent Models	7	891	November 13, 2020

Training loss is not decreasing using TFBertModel

Related topics