How retrieval loss is calculated in RAG model?

I was reading the code for RAG (Retrieval Augmented Generation) on transformers github.

I wanted to know how gradients are backpropagated till query encoder model. I wrote an answer for it.

But then I wondered… how come loss of retrieval model (query encoder) is calculated by simply taking the softmax over the doc_scores? Here

I get that they are adding the softmax over doc_scores to the seq_logits loss in order backpropagate the gradients. But I am unable to understand the intuition behind doing a softmax over doc_scores? and why that prob distribution is added to seq_logits loss? Both are different