Hello! I’ve been trying to fine-tune the multimodal model Blip3-o using Runpod and ChatGPT as a guide but feel like I keep going in circles having problems with dependencies. Never been able to run a test.py or even my dataset and get an output. It’s always something. Any guidance from someone that has been able to pull this off?
Thanks!
Use a prebuilt Docker image
Try:
FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
RUN pip install git+https://github.com/salesforce/LAVIS.git@main
&& pip install accelerate transformers bitsandbytes datasets
Validate BLIP-3 with a simple test:
from lavis.models import load_model_and_preprocess
model, vis_processors, txt_processors = load_model_and_preprocess(
name="blip2_t5_instruct",
model_type="pretrain_flant5xl",
is_eval=True,
device="cuda"
)
Start with instruct_finetune.py
Don't launch the full training loop until you're sure the dataset and model initialize cleanly.
Format your dataset properly
Use HuggingFace-style datasets avoid raw filepaths on Runpod. Structure matters.
Notes:
bitsandbytes will silently fail if GPU doesn’t support it.
Check transformers version if tokenizer fails to load.
Set HF_HOME=/workspace/hf_cache in your Docker to avoid cache collisions.
Let me know your dataset format if you need a single-image test script to verify captioning before tuning.
Solution provided by Triskel Data Deterministic AI.
Thanks so much! working on it… will let you know.
Hello, after running both PIP lines, and creating a python file (test_blip2.py), when running the test script you suggested, i get the following error:
Traceback (most recent call last):
File “/workspace/test_blip.py”, line 1, in
from lavis.models import load_model_and_preprocess
ModuleNotFoundError: No module named ‘lavis’
Thoughts?
Perhaps lavis has not been installed…
By the way, for BLIP-3o, it may be more reliable to use torchrun rather than lavis (according to the official GitHub).