Armin
October 20, 2021, 3:42pm
1
I’m using the CLIP for finding similarities between text and image but I realized the pretrained models are loading on CPU but I want to load it on GPU since in CPU is not fast. How can I load them on GPU?
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
Thanks!
nielsr
October 21, 2021, 8:16am
2
Here’s how you can put a model on GPU (same for any PyTorch model):
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
model.to(device)
Armin
October 21, 2021, 8:32am
3
Yes but my issue is with the second line. I tried to send it to GPU but I cannot 'CLIPProcessor' object has no attribute 'cuda'
and for running the code on GPU I need to send both model and processor on GPU
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
Do you know how can I send the CLIPProcessor to the GPU?
nielsr
October 21, 2021, 9:17am
4
You cannot move a processor to the GPU. It is meant to prepare data for the model.
The only things you need to move to the GPU are the model and the data.
I ran the following, which worked!
# Convert each PIL image to PyTorch tensor (then CUDA or cpu)
transform = transforms.Compose([transforms.ToTensor()])
images = [transform(image).to(device) for image in images]
image_processor = processor(images=images, return_tensors="pt", padding=True)
image_processor['pixel_values'] = image_processor['pixel_values'].to(device)
embeddings = model.get_image_features(**image_processor)
2 Likes
Adding to @ lukeigel :
# Processing images in batches
for batch in batches:
images = []
image_ids = []
# Load images for the current batch
for image_name in batch:
image_path = os.path.join(image_dir, image_name)
images.append(Image.open(image_path))
image_ids.append(image_name)
# transform = transforms.Compose([transforms.ToTensor()]) # optional
# images = [transform(image).to(device) for image in images] # optional
logging.info('Processing using model...')
inputs = processor(text=["A photo closely related to COVID-19.", "A photo irrelevant to COVID-19."], images=images, return_tensors="pt", padding=True)
inputs['input_ids'] = inputs['input_ids'].to(device)
inputs['attention_mask'] = inputs['attention_mask'].to(device)
inputs['pixel_values'] = inputs['pixel_values'].to(device)
outputs = model(**inputs)
# print(outputs.image_embeds)
embeddings = outputs.image_embeds.detach().cpu().numpy().tolist() # Convert to list for database insertion
I did this for my case:
model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
model.to(device)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
url = "https://img.olympics.com/images/image/private/t_social_share_thumb/f_auto/primary/ufqgtgmmdvwrcqavdrhw"
image = Image.open(requests.get(url, stream=True).raw)
transform = transforms.Compose([transforms.ToTensor()])
image_transformed = transform(image).to(device)
inputs = processor(text=["basketball", "football", "tennis"], images=image_transformed, return_tensors="pt", padding=True)
inputs['input_ids'] = inputs['input_ids'].to(device)
inputs['attention_mask'] = inputs['attention_mask'].to(device)
inputs['pixel_values'] = inputs['pixel_values'].to(device)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
But compared to cpu, the cuda device changed the probabilites dramatically! (in my case to worse).
Do you know why?