Correct way to implement custom model on top of pretrained bert?

Me and my team (Beginners) are doing an ML project with BERT huggingface where we are carrying out binary classification on sentences based on an attribute of the sentence. This is our code for our own custom model based upon the
neuralspace-reverie/indic-transformers-bn-bert pretrained model. We are confused if the ordering of dropout and activation function matters in the class, and if overall the implementation of custom hidden layers are correct. We intended to add 2 dense layers (l1 and l2) with tanh activation function and dropout 0.1. The number of nodes for these layers are 512 and 256 respectively. We used softmax as output layer.

class MyTaskSpecificCustomModel(nn.Module):
    def __init__(self, checkpoint, num_labels):
        super(MyTaskSpecificCustomModel, self).__init__()
        self.num_labels = num_labels
        
        self.model = AutoModel.from_pretrained(checkpoint, config = AutoConfig.from_pretrained(checkpoint,output_attention = True,output_hidden_state = True))
        
        # This is to freeze the weights of the pretrained model.
        for _ , param in self.model.named_parameters():
            param.requires_grad=False
            
        # New Layer
        self.dropout1 = nn.Dropout(0.1)
        # self.classifier = nn.Linear(768, num_labels )
        #layer 1
        self.dropout2 = nn.Dropout(0.1)
        self.activation1 = nn.Tanh()
        self.l1 = nn.Linear(768, 512)

        #layer 2
        self.dropout3 = nn.Dropout(0.1)
        self.l2 = nn.Linear(512, 256)
        self.activation2 = nn.Tanh()

        #layer 3
        self.l3 = nn.Linear(256, num_labels)
        # self.activation3 = nn.Tanh()
        self.softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, input_ids = None, attention_mask=None, Type = None):
        outputs = self.model(input_ids = input_ids, attention_mask = attention_mask)
        last_hidden_state = outputs[0]       
        sequence_outputs = self.dropout1(last_hidden_state)
        
        # logits = self.classifier(sequence_outputs[:, 0, : ].view(-1, 768))
        #layer 1
        logits = self.l1(sequence_outputs[:, 0, : ].view(-1, 768))
        logits = self.dropout2(logits)
        logits = self.activation1(logits)

        #layer 2
        logits = self.l2(logits)
        logits = self.dropout3(logits)
        logits = self.activation2(logits)

        #output layer
        logits = self.l3(logits)
        logits = self.softmax(logits)

        return logits