Load CLIP pretrained model on GPU

Armin · October 20, 2021, 3:42pm

I’m using the CLIP for finding similarities between text and image but I realized the pretrained models are loading on CPU but I want to load it on GPU since in CPU is not fast. How can I load them on GPU?

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

Thanks!

nielsr · October 21, 2021, 8:16am

Here’s how you can put a model on GPU (same for any PyTorch model):

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
model.to(device)

Armin · October 21, 2021, 8:32am

Yes but my issue is with the second line. I tried to send it to GPU but I cannot 'CLIPProcessor' object has no attribute 'cuda' and for running the code on GPU I need to send both model and processor on GPU

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

Do you know how can I send the CLIPProcessor to the GPU?

nielsr · October 21, 2021, 9:17am

You cannot move a processor to the GPU. It is meant to prepare data for the model.

The only things you need to move to the GPU are the model and the data.

lukeigel · January 24, 2023, 2:16am

I ran the following, which worked!

# Convert each PIL image to PyTorch tensor (then CUDA or cpu)
transform = transforms.Compose([transforms.ToTensor()])
images = [transform(image).to(device) for image in images]

image_processor = processor(images=images, return_tensors="pt", padding=True)
image_processor['pixel_values'] = image_processor['pixel_values'].to(device)
embeddings = model.get_image_features(**image_processor)

Raychanan · January 5, 2024, 10:13am

Adding to @ lukeigel:

# Processing images in batches
for batch in batches:
    images = []
    image_ids = []

    # Load images for the current batch
    for image_name in batch:
        image_path = os.path.join(image_dir, image_name)
        images.append(Image.open(image_path))
        image_ids.append(image_name)
    # transform = transforms.Compose([transforms.ToTensor()]) # optional
    # images = [transform(image).to(device) for image in images] # optional
    logging.info('Processing using model...')
    inputs = processor(text=["A photo closely related to COVID-19.", "A photo irrelevant to COVID-19."], images=images, return_tensors="pt", padding=True)
    inputs['input_ids'] = inputs['input_ids'].to(device)
    inputs['attention_mask'] = inputs['attention_mask'].to(device)
    inputs['pixel_values'] = inputs['pixel_values'].to(device)
    
    outputs = model(**inputs)
    # print(outputs.image_embeds)
    embeddings = outputs.image_embeds.detach().cpu().numpy().tolist()  # Convert to list for database insertion

AntonisSt · March 6, 2024, 7:07pm

I did this for my case:

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
model.to(device)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

url = "https://img.olympics.com/images/image/private/t_social_share_thumb/f_auto/primary/ufqgtgmmdvwrcqavdrhw"
image = Image.open(requests.get(url, stream=True).raw)

transform = transforms.Compose([transforms.ToTensor()])

image_transformed = transform(image).to(device)
inputs = processor(text=["basketball", "football", "tennis"], images=image_transformed, return_tensors="pt", padding=True)

inputs['input_ids'] = inputs['input_ids'].to(device)
inputs['attention_mask'] = inputs['attention_mask'].to(device)
inputs['pixel_values'] = inputs['pixel_values'].to(device)

outputs = model(**inputs)

logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

But compared to cpu, the cuda device changed the probabilites dramatically! (in my case to worse).

Do you know why?

Topic		Replies	Views
Encoding video frames using CLIP 🤗Transformers	0	1341	June 12, 2022
CLIPModel finetuning Models	9	9206	July 20, 2022
[PYTORCH] Trace on CPU and use on GPU 🤗Transformers	4	8634	July 15, 2020
Pretrained Models for Inferencing not using gpu Models	0	346	August 7, 2023
CLIP model incorporated in CLIPSeg 🤗Transformers	0	741	February 22, 2023

Load CLIP pretrained model on GPU

Related topics