When forwarding sentence pair (sentence A, sentence B) through BERT (as shown below):
sentence A
sentence B
BERT
how to summarize the attention scores that each word of sentence B attends on each word of sentence A?