MMing
August 10, 2022, 8:10am
1
My code as following, how to get equal encodings from same image?
import torch
from transformers import ViTMAEModel
pixel_value = torch.randn([1, 3, 224, 224])
model = ViTMAEModel.from_pretrained("facebook/vit-mae-base").eval() # convert to evaluate mode.
encoding_a = model(pixel_value).last_hidden_state[:, 1:, :].mean(dim=1)
encoding_b = model(pixel_value).last_hidden_state[:, 1:, :].mean(dim=1)
# print(encoding_a == encoding_b )
assert torch.equal(encoding_a , encoding_b ) # not equal!
MMing
August 10, 2022, 9:23am
2
I have handle it problem: just set ‘mask_ratio = 0’, I don’t known whether this way is apply to inference?
model = ViTMAEModel.from_pretrained("facebook/vit-mae-base", mask_ratio=0.0).eval() # convert to evaluate mode.
nielsr
August 10, 2022, 9:27am
3
Hi,
The model internally generates a random boolean mask as seen here .
To make it reproducable, one can provide a noise
argument to the forward method (to make sure the same boolean mask is applied):
import numpy as np
import torch
from transformers import ViTMAEModel
pixel_values = torch.randn([1, 3, 224, 224])
model = ViTMAEModel.from_pretrained("facebook/vit-mae-base")
noise = num_patches = int((model.config.image_size // model.config.patch_size) ** 2)
noise = np.random.uniform(size=(1, num_patches))
encoding_a = model(pixel_values, noise=torch.from_numpy(noise)).last_hidden_state[:, 1:, :].mean(dim=1)
encoding_b = model(pixel_values, noise=torch.from_numpy(noise)).last_hidden_state[:, 1:, :].mean(dim=1)
assert torch.equal(encoding_a , encoding_b)
MMing
August 10, 2022, 12:24pm
5
Thanks your reply.
But I found that we just need to reload the model with parameter ‘mask_ratio = 0.0’( equates to let model see all patchs), then could get reproducable encoding from same image.