Running out of System RAM while loading BLIP2 on Colab?

I’m trying to load the BLIP2 model on Google Colab using the code below.

!pip install --quiet bitsandbytes
!pip install --quiet --upgrade transformers # Install latest version of transformers
!pip install --quiet --upgrade accelerate
!pip install --quiet sentencepiece

model_name = "blip2-opt-2.7b"

from transformers import AutoModelForSeq2SeqLM, AutoProcessor
from transformers import BlipProcessor, Blip2ForConditionalGeneration
from accelerate import Accelerator
import torch
accelerator = Accelerator()

processor = AutoProcessor.from_pretrained(model_id, load_in_8bit=True)
model = Blip2ForConditionalGeneration.from_pretrained(model_id, load_in_8bit=True, device_map="auto", \
                                                      offload_state_dict=True, \
model = accelerator.prepare(model)

Even after using accelerate and setting device_map to “auto,” I’m unable to load the model. Moreover, while loading, it’s not utilizing the GPU RAM (on Colab, which is 15 GB) and exhausting the entire system RAM. Is there something that I’m missing?