I am trying to create a multi-label sentiment analysis classifier(number of classes = 28) and my goal is to:
1.Train the various BERT layers for my specific task.( using a pre_trained tokenizer on BERT_Base or DISTILLBERT )
2. Conduct experiments by extracting the output of hidden BERT layers and combine (adding/averaging) it with the output (‘CLS’) and compare the metrics.
My questions are:
- How do I re-train a transformer model ? I do not want to use the pre-trained weights.
- How do I extract the output of a hidden transformer block to combine with ‘CLS’ to generate a prediction?
Appreciate any help and pointers!