I have a BERT encoder model and I want to input the last hidden state output of this to GPT2 as encoder-decoder attention. There are no options in transformers.GPT2Config
to use encoder’s last hidden layer as input to GPT2. How do I achieve this?
I want something like this:
inputs = input_ids, token_type_ids, labels, attention_mask
encoder = RobertaForMaskedLM(config=encoder_config)
encoder_output = encoder(**inputs)
last_hidden_layer = encoder_output.hidden_states[-1]
decoder = GPT2LMHeadModel(config=decoder_config)
decoder_output = decoder(**inputs, last_hidden_layer)
where the last_hidden_layer
is used as encoder-decoder attention to each transformer unit in GPT2.