I want to add a GRU “layer” from pytorch after a the pretrained BertModel as follows but I am not sure about the input_size.
class BERT_Arch(nn.Module):
def __init__(self, bert):
super(BERT_Arch, self).__init__()
self.bert = BertModel.from_pretrained('bert-base-uncased')
# GRU
self.gru = nn.GRU(input_size=? , hidden_size=256, num_layers=2) # input_size, hidden_size, num_layers
def forward(...)
...
The input_size
of the GRU should be the output size of the Bert, right?
According to the docs, Bert returns pooler_output
. Would I need to input that to the GRU?