Prompt-tuning for Multimodal model

alphairawan · March 25, 2025, 9:21pm

I am currently working on multimodal model, llava-next-video for video classification. I would like to try prompt tuning on that model. When I run through this notebook example for prompt tuning and the documentation, I did not find specification for the prompt data for multimodal prompting. In my case, I use following chat template,

template= [
        {
            "role": "user",
            "content": [
                {"type": "video"},
                {
                    "type": "text",
                    "text": (
                        "Please classify the behaviour in the video if it contain punching"
                    )
                }
            ]
        }
    ]

Is there any reference or github repo on how to use peft for prompt tuning to multimodal prompting using chat template?

John6666 · March 26, 2025, 8:23am

Perhaps this is an example.

github.com/NielsRogge/Transformers-Tutorials

LLaVA-NeXT-Video/Fine_tune_LLaVa_NeXT_Video_with_HFTrainer.ipynb

master

{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "6fc10208-7733-4000-8adf-2019708b2c2b",
      "metadata": {
        "id": "6fc10208-7733-4000-8adf-2019708b2c2b"
      },
      "source": [
        "## Prerequisites\n",
        "Before we start, make sure you have the following:\n",
        "\n",
        "Access to GPUs (preferably 80GB or more since videos require high sequence lengths).\n",
        "Familiarity with Hugging Face’s Transformers library.\n",
        "Pre-install necessary packages by running the below.\n",
        "\n",
        "From video decoders you can install only one, the one you will use. Below I will provide helper functions to read videos using any of the three libraries, yet the default is decord which I found to be x8-10 faster."
      ]
    },
    {

This file has been truncated. show original

Topic		Replies	Views
Fine-tunening a multimodal model Beginners	4	5676	December 25, 2024
Is this correct approach to do Prompt Tuning on DollyV2 model 🤗Transformers	0	611	May 9, 2023
How to use PEFT approach to do Prompt Tuning on DollyV2 model 🤗Transformers	0	774	May 4, 2023
Prompt Tuning for Sequence Classification using PEFT Models	0	160	January 17, 2024
MMBT Model (Resnet and BERT) for multimodal embeddings 🤗Transformers	3	4039	November 10, 2021

Prompt-tuning for Multimodal model

Related topics