Fine-tuning Blip3-o with Runpod

Buildesolutions · June 17, 2025, 2:49am

Hello! I’ve been trying to fine-tune the multimodal model Blip3-o using Runpod and ChatGPT as a guide but feel like I keep going in circles having problems with dependencies. Never been able to run a test.py or even my dataset and get an output. It’s always something. Any guidance from someone that has been able to pull this off?
Thanks!

Pimpcat-AU · June 17, 2025, 6:21am

Use a prebuilt Docker image
Try:

FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
RUN pip install git+https://github.com/salesforce/LAVIS.git@main
&& pip install accelerate transformers bitsandbytes datasets

Validate BLIP-3 with a simple test:

from lavis.models import load_model_and_preprocess

model, vis_processors, txt_processors = load_model_and_preprocess(
    name="blip2_t5_instruct",
    model_type="pretrain_flant5xl",
    is_eval=True,
    device="cuda"
)

Start with instruct_finetune.py
Don't launch the full training loop until you're sure the dataset and model initialize cleanly.

Format your dataset properly
Use HuggingFace-style datasets avoid raw filepaths on Runpod. Structure matters.

Notes:

bitsandbytes will silently fail if GPU doesn’t support it.

Check transformers version if tokenizer fails to load.

Set HF_HOME=/workspace/hf_cache in your Docker to avoid cache collisions.

Let me know your dataset format if you need a single-image test script to verify captioning before tuning.

Solution provided by Triskel Data Deterministic AI.

Buildesolutions · June 17, 2025, 2:49pm

Thanks so much! working on it… will let you know.

Buildesolutions · June 17, 2025, 3:05pm

Hello, after running both PIP lines, and creating a python file (test_blip2.py), when running the test script you suggested, i get the following error:

Traceback (most recent call last):
File “/workspace/test_blip.py”, line 1, in
from lavis.models import load_model_and_preprocess
ModuleNotFoundError: No module named ‘lavis’

Thoughts?

John6666 · June 17, 2025, 3:18pm

Perhaps lavis has not been installed…

By the way, for BLIP-3o, it may be more reliable to use torchrun rather than lavis (according to the official GitHub).

Topic		Replies	Views
Finetune BLIP on customer dataset #20893 Models	22	7384	September 16, 2024
Fine tuned BLIP model is somehow 10x slower during inference Beginners	1	1178	May 29, 2023
Solution for Fine Tuning the Blip Model 🤗Transformers	0	94	December 13, 2024
Any one have an idea on how large should the dataset to be to fine-tune BLIP2 model? Models	0	152	November 16, 2024
I would like to finetune the blip model on ROCO data set for image captioning of chest x-rays 🤗Transformers	0	588	February 12, 2023

Fine-tuning Blip3-o with Runpod

Related topics