How to add a pointer network on top of GPT2?

I am trying to build a dialog generation model using the popular GPT2LMHeadModel from hugging face’s transformers python library.

Due to the nature of the task, in order to answer the user’s question in an accurate and informative manner, while the response usually contain parts of a knowledge snippet and the dialog context, both of which are given as inputs to the model, I want to add a pointer network directly on top of GPT2 decoder to copy words directly from the input.Since GPT2 is a decoder-only architecture, I am having some difficulty implementing this (references are available for encoder-decoder type architectures with T5, Roberta)

Can any one suggest me of an implementation that uses a pointer network on top of GPT2 or give me any directions here, please?