How to add encoder's last hidden state to GPT2 as encoder-decoder attention

I have a BERT encoder model and I want to input the last hidden state output of this to GPT2 as encoder-decoder attention. There are no options in transformers.GPT2Config to use encoder’s last hidden layer as input to GPT2. How do I achieve this?

I want something like this:

inputs = input_ids, token_type_ids, labels, attention_mask

encoder           = RobertaForMaskedLM(config=encoder_config)
encoder_output    = encoder(**inputs)
last_hidden_layer = encoder_output.hidden_states[-1]

decoder           = GPT2LMHeadModel(config=decoder_config)
decoder_output    = decoder(**inputs, last_hidden_layer) 

where the last_hidden_layer is used as encoder-decoder attention to each transformer unit in GPT2.