Extracting the output of hidden BERT layers and re-training the BERT model on custom datasets

Hi All,
I am trying to create a multi-label sentiment analysis classifier(number of classes = 28) and my goal is to:

1.Train the various BERT layers for my specific task.( using a pre_trained tokenizer on BERT_Base or DISTILLBERT )
2. Conduct experiments by extracting the output of hidden BERT layers and combine (adding/averaging) it with the output (‘CLS’) and compare the metrics.

My questions are:

  • How do I re-train a transformer model ? I do not want to use the pre-trained weights.
  • How do I extract the output of a hidden transformer block to combine with ‘CLS’ to generate a prediction?

Appreciate any help and pointers!