CLIPVisionModel Padding Problem

yunust · November 16, 2024, 6:33pm

The documentation on CLIPVIsionModel says that :

Pixel values. Padding will be ignored by default should you provide it. Pixel values can be obtained using[AutoImageProcessor]

However, when I pad my image with image_transforms.pad and do a forward pass, results become much different than the normal image.

from transformers.image_transforms import pad
import numpy as np

# Example image as a NumPy array
image = np.random.rand(224, 224, 3)  # Height x Width x Channels

# Define padding: ((before_height, after_height), (before_width, after_width))
padding = ((0, 0), (112, 112))  # Pads width to make it 448

# Apply padding
padded_image = pad(image, padding=padding)
print("Original Image Shape:", image.shape)
print("Padded Image Shape:", padded_image.shape)

image_torch = torch.tensor(image).permute(2, 0, 1).unsqueeze(0)
padded_image_torch = torch.tensor(padded_image).permute(2, 0, 1).unsqueeze(0)

print("Original Image Shape (Torch):", image_torch.shape)
print("Padded Image Shape (Torch):", padded_image_torch.shape)
# Step 5: Pass the padded image through the model
outputs_padded = model(pixel_values=padded_image_torch, interpolate_pos_encoding=True)
outputs_original = model(pixel_values=image_torch)
# Step 6: Extract the results for comparison
original = outputs_original.pooler_output
padded  = outputs_padded.pooler_output

print(torch.mean(original - padded))

how to handle image padding properly.

John6666 · November 17, 2024, 2:43am

It’s a bug, no matter how we look at it. Please see this behavior.
It looks like we should submit an issue or PR to github…

from transformers.image_transforms import pad
import numpy as np
import torch
from PIL import Image
from transformers import CLIPImageProcessor, CLIPVisionModel
model = CLIPVisionModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-base-patch32")

# Example image as a NumPy array
image = np.random.rand(224, 224, 3)  # Height x Width x Channels
image_pil = np.array(Image.fromarray(image, 'RGB')) # Open with PIL and save

# Define padding: ((before_height, after_height), (before_width, after_width))
padding = ((0, 0), (112, 112))  # Pads width to make it 448

# Apply padding
padded_image = pad(image, padding=padding)
padded_image_pil = pad(image_pil, padding=padding)
print("Original Image Shape:", image.shape)
print("Padded Image Shape:", padded_image.shape)

image_torch = torch.tensor(image).permute(2, 0, 1).unsqueeze(0)
padded_image_torch = torch.tensor(padded_image).permute(2, 0, 1).unsqueeze(0)

print("Original Image Shape (Torch):", image_torch.shape)
print("Padded Image Shape (Torch):", padded_image_torch.shape)
# Step 5: Pass the padded image through the model
outputs_padded = model(pixel_values=padded_image_torch, interpolate_pos_encoding=True)
outputs_original = model(pixel_values=image_torch)
# Step 6: Extract the results for comparison
original = outputs_original.pooler_output
padded  = outputs_padded.pooler_output

print(torch.mean(original - padded))

# Save images
original_im = Image.fromarray(image, 'RGB')
padded_im = Image.fromarray(padded_image, 'RGB')
padded_im_pil = Image.fromarray(padded_image_pil, 'RGB')
original_im.save("_pad_original.png") # normal
padded_im.save("_pad_padded.png") # strange
padded_im_pil.save("_pad_padded_pil.png") # normal

John6666 · November 18, 2024, 10:58am

I opened issue.

github.com/huggingface/transformers

Malfunctioning of transformers.image_transforms.pad

opened 10:57AM - 18 Nov 24 UTC

John6666cat

bug

### System Info - `transformers` version: 4.46.1 - Platform: Windows-10-10.0.1…9045-SP0 - Python version: 3.9.13 - Huggingface_hub version: 0.26.2 - Safetensors version: 0.4.3 - Accelerate version: 1.0.1 - Accelerate config: not found - PyTorch version (GPU?): 2.4.0+cu124 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using distributed or parallel set-up in script?: <fill in> - Using GPU in script?: <fill in> - GPU type: NVIDIA GeForce RTX 3060 Ti ### Who can help? _No response_ ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below) ### Reproduction ```py from transformers.image_transforms import pad import numpy as np import torch from PIL import Image # Example image as a NumPy array image = np.random.rand(224, 224, 3) # Height x Width x Channels image_pil = np.array(Image.fromarray(image, 'RGB')) # Open with PIL and save image_uint8 = (image * 255.0).astype(np.uint8) # Define padding: ((before_height, after_height), (before_width, after_width)) padding = ((0, 0), (112, 112)) # Pads width to make it 448 # Apply padding padded_image = pad(image, padding=padding) padded_image_pil = pad(image_pil, padding=padding) padded_image_uint8 = pad(image_uint8, padding=padding) print("Original Image Shape:", image.shape) print("Padded Image Shape:", padded_image.shape) print("Padded Image Shape (PIL):", padded_image_pil.shape) print("Padded Image Shape (uint8):", padded_image_uint8.shape) image_torch = torch.tensor(image).permute(2, 0, 1).unsqueeze(0) padded_image_torch = torch.tensor(padded_image).permute(2, 0, 1).unsqueeze(0) padded_image_pil_torch = torch.tensor(padded_image_pil).permute(2, 0, 1).unsqueeze(0) padded_image_uint8_torch = torch.tensor(padded_image_uint8).permute(2, 0, 1).unsqueeze(0) print("Original Image Shape (Torch):", image_torch.shape) print("Padded Image Shape (Torch):", padded_image_torch.shape) print("Padded Image Shape (PIL) (Torch):", padded_image_pil_torch.shape) print("Padded Image Shape (uint8) (Torch):", padded_image_uint8_torch.shape) # Save images original_im = Image.fromarray(image, 'RGB') padded_im = Image.fromarray(padded_image, 'RGB') padded_im_pil = Image.fromarray(padded_image_pil, 'RGB') padded_im_uint8 = Image.fromarray(padded_image_uint8, 'RGB') original_im.save("_pad_original.png") # normal padded_im.save("_pad_padded.png") # strange padded_im_pil.save("_pad_padded_pil.png") # normal padded_im_uint8.save("_pad_padded_uint8.png") # relatively normal ``` ### Expected behavior After receiving a report on the Hugging Face forum that the `padding` in the `transformers` library was behaving strangely, I investigated and found the approximate cause. It seems that the `pad` function in `numpy` returns strange results when it receives an ndarray that is not `uint8`. As a simple workaround, there is a method of converting it to `Pillow Image` once, but this method is dependent on `Pillow`. If the library converts it to `uint8` on its own, it may be a little troublesome to judge the numerical range of the image. I opened an issue instead of a PR because I couldn't think of a good implementation. ### Reference https://discuss.huggingface.co/t/clipvisionmodel-padding-problem/124187 A post that helps me find a problem https://discord.com/channels/879548962464493619/1301002234963820604 Investigating preprocessor bugs ### Dependencies ``` transformers==4.46.2 torch==2.4.0 numpy<2 ``` ### Demo https://huggingface.co/spaces/John6666/transformers_padding_bug_test

Topic		Replies	Views
ValueError: Unable to create tensor, activate padding with 'padding=True' Beginners	3	110	November 13, 2024
Proper way to handle non-square images with CLIP? Models	4	6945	August 13, 2023
XLMProphetNet returning different results when using padding 🤗Transformers	1	242	June 16, 2023
Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length Beginners	1	1023	November 6, 2024
Input size of CLIPVisionModel reverts back to default while using pretrained weights Beginners	5	1492	July 24, 2023

CLIPVisionModel Padding Problem

Related topics