Why does ViTForMaskedImageModeling not construct the original image correctly?

Bassel1 · May 17, 2023, 4:49am

I was trying to use masked image modeling in huggingface and I saw ViTForMaskedImageModeling
in the documentation but I did not understand how it reconstructs the original image loss, reconstructed_pixel_values = outputs.loss, outputs.reconstruction
also, it doesn’t reconstruct the original image correctly. It gives me noise.

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
model = ViTForMaskedImageModeling.from_pretrained("google/vit-base-patch16-224-in21k")

num_patches = (model.config.image_size // model.config.patch_size) ** 2
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
# create random boolean mask of shape (batch_size, num_patches)
bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool()

outputs = model(pixel_values, bool_masked_pos=bool_masked_pos)
loss, reconstructed_pixel_values = outputs.loss, outputs.reconstruction

reconstructed_pixel_values = reconstructed_pixel_values.detach().numpy()
reconstructed_pixel_values = np.transpose(reconstructed_pixel_values[0], (1, 2, 0))

plt.imshow(reconstructed_pixel_values)
plt.show()

Topic		Replies	Views
How to convert ViTForMaskedImageModeling outputs to image Intermediate	1	583	August 23, 2022
Inference with VitMAE by providing a mask 🤗Transformers	0	286	January 3, 2024
Combining encoder from one model and a decoder for another for image reconstruction Beginners	0	341	December 15, 2022
How to properly train BEiT for Masked Image Modeling Intermediate	0	945	March 7, 2022
Calling ViTMAEModel with embeddings and encoder Beginners	2	287	January 31, 2024

Why does ViTForMaskedImageModeling not construct the original image correctly?

Related topics