Load CLIP pretrained model on GPU

I’m using the CLIP for finding similarities between text and image but I realized the pretrained models are loading on CPU but I want to load it on GPU since in CPU is not fast. How can I load them on GPU?

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

Thanks!

Here’s how you can put a model on GPU (same for any PyTorch model):

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
model.to(device)

Yes but my issue is with the second line. I tried to send it to GPU but I cannot 'CLIPProcessor' object has no attribute 'cuda' and for running the code on GPU I need to send both model and processor on GPU

processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

Do you know how can I send the CLIPProcessor to the GPU?

You cannot move a processor to the GPU. It is meant to prepare data for the model.

The only things you need to move to the GPU are the model and the data.

I ran the following, which worked!

# Convert each PIL image to PyTorch tensor (then CUDA or cpu)
transform = transforms.Compose([transforms.ToTensor()])
images = [transform(image).to(device) for image in images]

image_processor = processor(images=images, return_tensors="pt", padding=True)
image_processor['pixel_values'] = image_processor['pixel_values'].to(device)
embeddings = model.get_image_features(**image_processor)
2 Likes

Adding to @ lukeigel:

# Processing images in batches
for batch in batches:
    images = []
    image_ids = []

    # Load images for the current batch
    for image_name in batch:
        image_path = os.path.join(image_dir, image_name)
        images.append(Image.open(image_path))
        image_ids.append(image_name)
    # transform = transforms.Compose([transforms.ToTensor()]) # optional
    # images = [transform(image).to(device) for image in images] # optional
    logging.info('Processing using model...')
    inputs = processor(text=["A photo closely related to COVID-19.", "A photo irrelevant to COVID-19."], images=images, return_tensors="pt", padding=True)
    inputs['input_ids'] = inputs['input_ids'].to(device)
    inputs['attention_mask'] = inputs['attention_mask'].to(device)
    inputs['pixel_values'] = inputs['pixel_values'].to(device)
    
    outputs = model(**inputs)
    # print(outputs.image_embeds)
    embeddings = outputs.image_embeds.detach().cpu().numpy().tolist()  # Convert to list for database insertion

I did this for my case:

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
model.to(device)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

url = "https://img.olympics.com/images/image/private/t_social_share_thumb/f_auto/primary/ufqgtgmmdvwrcqavdrhw"
image = Image.open(requests.get(url, stream=True).raw)

transform = transforms.Compose([transforms.ToTensor()])

image_transformed = transform(image).to(device)
inputs = processor(text=["basketball", "football", "tennis"], images=image_transformed, return_tensors="pt", padding=True)

inputs['input_ids'] = inputs['input_ids'].to(device)
inputs['attention_mask'] = inputs['attention_mask'].to(device)
inputs['pixel_values'] = inputs['pixel_values'].to(device)

outputs = model(**inputs)

logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

But compared to cpu, the cuda device changed the probabilites dramatically! (in my case to worse).

Do you know why?