Swapping GPT-2 Attention with Flash Attention

Hi all,

Is there currently a way to extract the attention attribute from a model such as GPT-2 and swap it with Flash-Attention?

Thank you,

Enrico