I was trying out a text classification problem. Text can be made up of 2+ sentences. I am having feeling that I should obtain BERT embedding of each of them separately. Then run pass them through neural network to obtain features of each sentence and then concatenate all of them together. Finally I will classify using binary classifier.
I was thinking to do this in pytorch. Based on task at hand, I have intuition that obtaining BERT embedding of each sentence separately will be more benefitial.
I have following doubts:
- Is there any example taking such approach?
- I am thinking of what should I do? Should my model inherit
nn.Module
from pytorch orBertPreTrainedModel
from huggingfacetransformers
. - How I can implement this concatenation logic in
__init__
method for building model and inforward
method for passing input sentences through corresponding neural network one by one? - What line
pooled_output = outputs[1]
is doing inBertPreTrainedModel
? Is it obtaining reference to embedding corresponding to[CLS]
token?