Any ways to visualize attention of the LXMERT?

I would like to observe the attention between an input RoI and each word in an input sentence of LXMERT. If a framework that facilitates what I want do exists, please let me know. If not, could you tell me which of the tensors from LXMERT I should watch?