Understanding how to implement custom BERT model

I was trying out a text classification problem. Text can be made up of 2+ sentences. I am having feeling that I should obtain BERT embedding of each of them separately. Then run pass them through neural network to obtain features of each sentence and then concatenate all of them together. Finally I will classify using binary classifier.

I was thinking to do this in pytorch. Based on task at hand, I have intuition that obtaining BERT embedding of each sentence separately will be more benefitial.

I have following doubts:

  1. Is there any example taking such approach?
  2. I am thinking of what should I do? Should my model inherit nn.Module from pytorch or BertPreTrainedModel from huggingface transformers.
  3. How I can implement this concatenation logic in __init__ method for building model and in forward method for passing input sentences through corresponding neural network one by one?
  4. What line pooled_output = outputs[1] is doing in BertPreTrainedModel? Is it obtaining reference to embedding corresponding to [CLS] token?