Fine-tuning Blip3-o with Runpod

Hello! I’ve been trying to fine-tune the multimodal model Blip3-o using Runpod and ChatGPT as a guide but feel like I keep going in circles having problems with dependencies. Never been able to run a test.py or even my dataset and get an output. It’s always something. Any guidance from someone that has been able to pull this off?
Thanks!

1 Like

Use a prebuilt Docker image
Try:

FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
RUN pip install git+https://github.com/salesforce/LAVIS.git@main
&& pip install accelerate transformers bitsandbytes datasets

Validate BLIP-3 with a simple test:

from lavis.models import load_model_and_preprocess

model, vis_processors, txt_processors = load_model_and_preprocess(
    name="blip2_t5_instruct",
    model_type="pretrain_flant5xl",
    is_eval=True,
    device="cuda"
)

Start with instruct_finetune.py
Don't launch the full training loop until you're sure the dataset and model initialize cleanly.

Format your dataset properly
Use HuggingFace-style datasets avoid raw filepaths on Runpod. Structure matters.

Notes:

bitsandbytes will silently fail if GPU doesn’t support it.

Check transformers version if tokenizer fails to load.

Set HF_HOME=/workspace/hf_cache in your Docker to avoid cache collisions.

Let me know your dataset format if you need a single-image test script to verify captioning before tuning.

Solution provided by Triskel Data Deterministic AI.

1 Like

Thanks so much! working on it… will let you know.

1 Like

Hello, after running both PIP lines, and creating a python file (test_blip2.py), when running the test script you suggested, i get the following error:

Traceback (most recent call last):
File “/workspace/test_blip.py”, line 1, in
from lavis.models import load_model_and_preprocess
ModuleNotFoundError: No module named ‘lavis’

Thoughts?

1 Like

Perhaps lavis has not been installed…

By the way, for BLIP-3o, it may be more reliable to use torchrun rather than lavis (according to the official GitHub).