Masking the user prompt/question for LLaVA loss computation

Hi,

I’m currently trying to calculate gradients for several VLMs. My text data contains a question and an answer. For Paligemma, I give the question to the processor via the text argument and the answer via the suffix argument like this:

        model_inputs = processor(text=question, images=image,  suffix=answer,... )

And if I do this, the logits of the question are correctly masked when computing the cross-entropy loss in the Paligemma forward function. So the loss only wants the model to output the correct answer but does not compute the next token prediction loss for the tokens from the question.

For LLaVA, I use a chat template and then tokenize this:

        conversation = [
            {

                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {"type": "image"},
                ],
            },
            {
                "role": "assistant",
                "content": [
                    {"type": "text", "text": answer},
                ],
            },
        ]
        llava_text = processor.apply_chat_template(conversation)
        model_inputs = processor(text=llava_text, images=image,...)

Now when debugging the loss calculation inside processing_llava.py, I noticed that the locations corresponding to the question are not masked out. So the loss will also force the model to correctly complete the question which is not necessary for my use case.

Is there a standard way to mask the question part here and only calculate the loss over the answer?

Thank you for your help!

2 Likes

I have the same question. Can anyone answer this. How to ignore the question tokens in the loss calculation ? This is particularly important for VLM models as we don’t want the model to predict tokens based on just image input.

1 Like

I am not too familiar with LLaVa but are you able to pass in labels? In the case of LLMs, i can change labels of prompt inputs into -100

1 Like