Key-value pair from attention layer of GPT2

I am using gpt2 pre-trained model and want to extract key-value pair of attention layer from all the 12 decoder blocks of this model.
I am into the understanding of memory augmented large language models.
is there any way to proceed for this?
Thanks in advance