Masking the user prompt/question for LLaVA loss computation

Maximal · August 27, 2024, 12:18pm

Hi,

I’m currently trying to calculate gradients for several VLMs. My text data contains a question and an answer. For Paligemma, I give the question to the processor via the text argument and the answer via the suffix argument like this:

        model_inputs = processor(text=question, images=image,  suffix=answer,... )

And if I do this, the logits of the question are correctly masked when computing the cross-entropy loss in the Paligemma forward function. So the loss only wants the model to output the correct answer but does not compute the next token prediction loss for the tokens from the question.

For LLaVA, I use a chat template and then tokenize this:

        conversation = [
            {

                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {"type": "image"},
                ],
            },
            {
                "role": "assistant",
                "content": [
                    {"type": "text", "text": answer},
                ],
            },
        ]
        llava_text = processor.apply_chat_template(conversation)
        model_inputs = processor(text=llava_text, images=image,...)

Now when debugging the loss calculation inside processing_llava.py, I noticed that the locations corresponding to the question are not masked out. So the loss will also force the model to correctly complete the question which is not necessary for my use case.

Is there a standard way to mask the question part here and only calculate the loss over the answer?

Thank you for your help!

pranjalsahu · February 12, 2025, 3:12am

I have the same question. Can anyone answer this. How to ignore the question tokens in the loss calculation ? This is particularly important for VLM models as we don’t want the model to predict tokens based on just image input.

wheynelau · February 12, 2025, 10:02am

I am not too familiar with LLaVa but are you able to pass in labels? In the case of LLMs, i can change labels of prompt inputs into -100

Topic		Replies	Views
LLaMA2 - tokenizer padding affecting logits (even with attention_mask) 🤗Transformers	8	4537	March 26, 2024
Llama model outputs strange words Beginners	0	130	December 1, 2024
Llama inference with apply_chat_template Beginners	0	216	November 30, 2024
Prompt loss weight instead of masking in generative models Intermediate	1	2188	June 18, 2023
Question about loss calculation on LLM finetuning Research	0	7070	July 14, 2023

Masking the user prompt/question for LLaVA loss computation

Related topics