Bert followed by a GRU

Learner · March 2, 2021, 6:34pm

I want to add a GRU “layer” from pytorch after a the pretrained BertModel as follows but I am not sure about the input_size.

class BERT_Arch(nn.Module):

def __init__(self, bert):
  
  super(BERT_Arch, self).__init__()

  self.bert = BertModel.from_pretrained('bert-base-uncased')
   
  # GRU
  self.gru = nn.GRU(input_size=? , hidden_size=256,     num_layers=2) # input_size, hidden_size, num_layers

def forward(...)
...

The input_size of the GRU should be the output size of the Bert, right?

According to the docs, Bert returns pooler_output. Would I need to input that to the GRU?

sgugger · March 3, 2021, 1:51pm

It depends on what you want: the base BERT model will return both the final hidden stage (shape (batch_size, sequence_length, hidden_size) and the pooler output which has the state for the CLS token of shape (batch_size, hidden_size).

Topic		Replies	Views
What would be the suggested way to customize a model? Beginners	1	449	February 25, 2022
Modify BERT encoder layers? 🤗Transformers	0	1023	June 18, 2021
Understanding how to implement custom BERT model Beginners	0	498	November 22, 2021
Model() output issue during migration from pytorch_pretrained_bert to transformers 🤗Transformers	0	545	September 15, 2020
New model output types 🤗Transformers	7	5721	March 11, 2021

Bert followed by a GRU

Related topics