Bert followed by a GRU

I want to add a GRU “layer” from pytorch after a the pretrained BertModel as follows but I am not sure about the input_size.

class BERT_Arch(nn.Module):

def __init__(self, bert):
  super(BERT_Arch, self).__init__()

  self.bert = BertModel.from_pretrained('bert-base-uncased')
  # GRU
  self.gru = nn.GRU(input_size=? , hidden_size=256,     num_layers=2) # input_size, hidden_size, num_layers

def forward(...)

The input_size of the GRU should be the output size of the Bert, right?

According to the docs, Bert returns pooler_output. Would I need to input that to the GRU?

It depends on what you want: the base BERT model will return both the final hidden stage (shape (batch_size, sequence_length, hidden_size) and the pooler output which has the state for the CLS token of shape (batch_size, hidden_size).