RAG Example and Word-Level contributions

PereLluis13 · October 11, 2020, 4:07pm

When RAG was presented, it did so along this very nice post:

I was wondering if there is a way to obtain the same information shown in the graphs when using HF RAG implementation. That is, the documents weights, as well as the Word-level contribution as referred in the article, or the RAG-Token document posterior as in the paper.

I am aware the document weights can be obtained when doing a forward pass, however these are not obtainable when using the generate() method, which I think would be a “nice have”. I guess now they can be obtained with an extra forward pass before generating, or just tweaking the generate method locally to return them.

However I am not sure how they obtain the posterior for each document. I am guessing it has to do with an average value from the tokens coming from each of the documents (so one would need to “split” the last hidden layer into the document chunks?). Does anyone know better how these could be obtained at generation in order to obtain similar figures as in the article?

Thanks

lhoestq · October 12, 2020, 10:02am

There’s a demo by the awesome @yjernite that shows that you can get the per-examples and per-word contributions.

There’s currently a PR to open source the code of the demo here where you can check the code.

Not sure we can get the RAG-Token posterior easily though

PereLluis13 · October 12, 2020, 10:46am

Thank you very much, that is what I needed, I thought the word-level contribution used there were the posteriors, maybe @yjernite could clarify that.

However by checking how the word-level contribution is computed I realized there’s something odd in the RAG documentation. The output of the forward function for the decoder should be (batch_size*config.n_docs , sequence_length, config.vocab_size) rather than (batch_size, sequence_length, config.vocab_size) as described in the docs and in the source file. I have tested and the current version of Transformers is behaving this way. Should I open an Issue at github with this? (A PR may be too much for such a small change).

yjernite · October 12, 2020, 3:09pm

Hi @PereLluis13.

First, we do need to update the documentation, thanks for pointing it out! The current models has two forward modes corresponding to the two shapes you mentioned: without and with marginalization, controlled by the do_marginalize flag, which is set to True in the generate function:

github.com

huggingface/transformers/blob/13c185771847370d695b8eee3cbf12f4edc2111c/src/transformers/modeling_rag.py#L1086


    decoder_input_ids=None,
    decoder_attention_mask=None,
    past_key_values=None,
    context_input_ids=None,
    context_attention_mask=None,
    doc_scores=None,
    use_cache=None,
    output_attentions=None,
    output_hidden_states=None,
    output_retrieved=None,
    do_marginalize=None,
    reduce_loss=None,
    labels=None,
    **kwargs  # needs kwargs for generation
):
    r"""
    do_marginalize (:obj:`bool`, `optional`):
        If :obj:`True`, the logits are marginalized over all documents
        by making use of ``torch.nn.functional.log_softmax``.
    reduce_loss (:obj:`bool`, `optional`):
        Only relevant if ``labels`` is passed.

You are right that getting the posterior token-level probabilities (used in the demo) currently requires an additional forward pass (on the decoder side only, you can re-use the encoder output and retrieved documents), as you can see here:

github.com

yjernite/transformers/blob/291cea09c3951cc3d042661913fff2acd7148812/examples/rag/app_rag_demo.py#L142


            answer_scores += [answer_score]
            explained_gen = []
            ans_id_list = answer_ids[0].tolist()
            if tokenizer.generator.eos_token_id in ans_id_list[1:]:
                s_len = ans_id_list[1:].index(tokenizer.generator.eos_token_id)
            else:
                s_len = len(ans_id_list) - 1
            for i in range(1, s_len):
                tid = ans_id_list[i + 1]
                token = tokenizer.generator._convert_id_to_token(tid).replace("Ġ", "_")
                step_probs = eval_probs[:, i, tid].tolist()
                explained_gen += [(token, [(di, p, p * doc_probs[di]) for di, p in enumerate(step_probs)])]
            explanations += [explained_gen]
    return {
        "answers": [(te, -sc) for te, sc in zip(answer_texts, answer_scores)],
        "documents": support_docs,
        "explanations": explanations,
    }


@st.cache(allow_output_mutation=True)

We’ve been going back and forth on returning the scores in the generate function, it will likely be available in a future PR.

The demo uses the retrieval scores for the documents, which corresponds to the priors for generation. To get the posteriors, you can use Bayes rule with the config.n_docs doc-level log-likelihoods obtained with do_marginalize=False:

p(d|q,a) \proportional p(a|d,q) \times p(d|q)

PereLluis13 · October 12, 2020, 4:14pm

Thank you for such a detailed response, all doubts are cleared . I should have noticed the do_marginalize flag.

For what is worth, I am in favor of returning scores at generate, even an option to include the posteriors would be nice (but perhaps too narrow to include in the generate function).

Anyway thanks again for the response and the nice demo!

Topic		Replies	Views
How do I use the RagRetriever to retrieve documents? (What is the question_hidden_states variable and how do make it?) Beginners	1	639	March 18, 2024
Function Calling and RAG Features Using Open-Source LLMs Intermediate	0	804	December 21, 2023
Trying RAG with other Retriever Models 🤗Transformers	0	429	January 21, 2021
How retrieval loss is calculated in RAG model? 🤗Transformers	0	357	February 14, 2024
RAG for Reading Comprehension Models	1	717	April 6, 2021

RAG Example and Word-Level contributions

Related topics