Fine-tunening a multimodal model

PatchouliPatch · March 15, 2024, 3:01am

hello! Does anyone have an example on how to fine-tune a multimodal model? The examples that I found only have texts as input, i.e. LLMs. I’m dealing with Vision-Language Models

soledadalborno · May 8, 2024, 1:48am

Did you find one?

nielsr · May 8, 2024, 7:43am

Hi,

Yes find them all in my repository GitHub - NielsRogge/Transformers-Tutorials: This repository contains demos I made with the Transformers library by HuggingFace..

Demo notebooks are grouped per model, so I’d recommend taking a look at the BLIP-2, Idefics2, PaliGemma and LLaVa folders for some examples.

ritabratamaiti · November 18, 2024, 10:53am

Hi! You can try AnyModal: GitHub - ritabratamaiti/AnyModal: AnyModal is a Flexible Multimodal Language Model Framework
It’s a project I have been working on that allows for the creation of multimodal LLMs.

Ismail-IFAKIR · December 25, 2024, 12:17pm

Hello guys,
I hope you are doing well, how can fine tune Multimodal LLM like (LLaVA, PaliGemma …) for Multimodal Aspect Based Sentiment Analysis (Text and Image)

Topic		Replies	Views
AnyModal – A Framework for Multimodal LLMs Show and Tell	0	249	November 17, 2024
Guidance on getting started with fine tuned uncensored model Beginners	2	1141	March 8, 2025
Multimodal transformer Models	0	1071	April 23, 2023
Fine-Tuning a Language Model with Data Extracted from Multiple PDFs for a Chat Interface 🤗Transformers	2	2607	November 5, 2024
Image Embedding from PaliGemma Model Beginners	7	584	March 5, 2025

Fine-tunening a multimodal model

Related topics