Using Pipeline with bitsandbytes and blip2 / blip - is this possible?

Mediocreatmybest · July 2, 2023, 10:40am

Hi!

Just curious if using the pipeline function, does this support changing the floating point precision? or using bitsandbytes to load a model in 8bit?

For example, on my space, when trying to load in 8bit, I see the error:

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I’m not sure if this is because it isn’t supported with pipeline or just doesn’t work with BLIP.

The whole code for running the space is:

import torch
import gradio as gr
from transformers import pipeline

CAPTION_MODELS = {
    'blip-base': 'Salesforce/blip-image-captioning-base',
    'blip-large': 'Salesforce/blip-image-captioning-large',
    'vit-gpt2-coco-en': 'ydshieh/vit-gpt2-coco-en',
    'blip2-2.7b-fp16': 'Mediocreatmybest/blip2-opt-2.7b-fp16-sharded',
    'blip2-2.7b': 'Salesforce/blip2-opt-2.7b',
}

# Create a dictionary to store loaded models
loaded_models = {}

# Simple caption creation
def caption_image(model_choice, image_input, url_input, load_in_8bit):
    if image_input is not None:
        input_data = image_input
    else:
        input_data = url_input

    model_key = (model_choice, load_in_8bit)  # Create a tuple to represent the unique combination of model and 8bit loading

    # Check if the model is already loaded
    if model_key in loaded_models:
        captioner = loaded_models[model_key]
    else:
        model_kwargs = {"load_in_8bit": load_in_8bit} if load_in_8bit else {}
        captioner = pipeline(task="image-to-text",
                            model=CAPTION_MODELS[model_choice],
                            max_new_tokens=30,
                            device_map="cpu", model_kwargs=model_kwargs, use_fast=True
                            )
        # Store the loaded model
        loaded_models[model_key] = captioner

    caption = captioner(input_data)[0]['generated_text']
    return str(caption).strip()

def launch(model_choice, image_input, url_input, load_in_8bit):
    return caption_image(model_choice, image_input, url_input, load_in_8bit)

model_dropdown = gr.Dropdown(choices=list(CAPTION_MODELS.keys()), label='Select Caption Model')
image_input = gr.Image(type="pil", label="Input Image")
url_input = gr.Text(label="Input URL")
load_in_8bit = gr.Checkbox(label="Load model in 8bit")

iface = gr.Interface(launch, inputs=[model_dropdown, image_input, url_input, load_in_8bit], outputs="text")
iface.launch()

Mediocreatmybest · July 7, 2023, 7:30am

For example, I can run through options such as this with bits and bytes.

But does pipeline support this ? As I haven’t been able to find any examples.

Mediocreatmybest · July 8, 2023, 2:17am

I’ve converted two models to 8bit with bits and bytes and uploaded with the tokenizer as well. Same issues.
I’m guessing this is something I’m missing or isn’t supported in pipeline?

Mediocreatmybest · July 14, 2023, 5:09am

After some more tests, it looks like the issue may be with the Blip2 Processor.
As that is automatically selected within the pipeline, it doesn’t seem to support dropping to 4 or 8bit.

AutoProcessor on the other hand seems to work fine when specifying and changing the floating point.

Does anyone know if its possible to override and force use AutoProcessor when calling the pipeline?

Thanks!

Mediocreatmybest · July 15, 2023, 2:16am

Also, further testing with Transformers directly.
Looks like 8bit is working with regular blip model and blip2. Would be great to figure out if this works in Pipeline as well.

kopyl · November 21, 2023, 12:22pm

@Mediocreatmybest how did you make it work?

Topic		Replies	Views
BitsAndBytes transformers issue 🤗Transformers	1	1333	September 15, 2023
Embedding from BLIP2 Models	0	621	June 20, 2023
Fine-tuning with load_in_8bit and inference without load_in_8bit possible? 🤗Transformers	4	19314	August 23, 2022
How to load StarCoder2 quantized to 4bits? Beginners	1	167	March 20, 2024
4-bit quantization Intermediate	0	332	November 18, 2023

Using Pipeline with bitsandbytes and blip2 / blip - is this possible?

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

Related Topics