How to visualize attention of a large encoder-decoder transformer model that isn't a model on hugging face?

Hello, I am attempting to visualize the attention weights, of the model ‘Grover’, in its inference mode. In this mode, it produces a probability score for each input of text. I have the checkpoints and config of the model but am struggling to convert this to any form in which I can use to produce visualizations.

Any help would really be appreciated!! Also, I am very happy to answer any follow-up questions to help clarify anything.