Regression is failing in fine tuning with BERT/GPT-2/Albert

I have been trying to use BertModel, albert and GPT2 models for fine tuning on my regression task and i was able to produce unwanted results . i will mention it below:

I tried it two times:

  1. I used CLS token embeddings and fine tuned over my entire custom model but it produced some random number repeating over and over in my output matrix space.

  2. I simply passed CLS token embeddings to the feed forward NN. In this case also it produced some random number.

what can be the solution to this problem?

class Custom_GPT(tf.keras.Model):

  def __init__(self,embedding_dim):

    super(Custom_GPT,self).__init__()

    self.embedding_dim=embedding_dim

    self.dense=tf.keras.layers.Dense(1,input_shape=(embedding_dim,),activation=None,name='dense_layer_1')

    self.GPT_layer=GPT_model

  def call(self,input_ids):

    sequence=self.GPT_layer(input_ids)[0]

    cls=sequence[:,0,:]

    x=self.dense(cls)

model doesn’t seem to be learning anything here. it generates a random constant repeatedly

Are you returning x after call? I am not familiar with Tensorflow, but I assume that you still have to return the final logits? Otherwise it will implicitly return None.