ViT produces different embeddings each time?

hmanju · July 10, 2023, 9:05am

I’m trying to obtain ViT Image embeddings but I get completely different embeddings for the same image during multiple inferences? Shouldn’t the image embedding be constant for inference of the same image?

device = torch.device('cuda')

processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
model = ViTModel.from_pretrained('google/vit-base-patch16-224')

model.to(device)
model.eval()

vit_vectors = []

for file_path in file_paths:

    image = Image.open(file_path)

    inputs = processor(images=image, return_tensors="pt")
    inputs.to(device)
    outputs = model(**inputs)

    _ = outputs.pooler_output.cpu().detach().numpy().copy()[0]

    vit_vectors.append(_)

Topic		Replies	Views
GPU is far slower than CPU for patch embedding 🤗Transformers	0	342	June 8, 2024
I'm failing to train a vit_base_patch16_224 model for creating high quality embeddings for screenshots Models	0	36	September 5, 2024
Is it possible to train ViT with different number of patches in every batch? (Non-square images dataset) Models	3	2977	May 1, 2024
What is ViTImageProcessor doing? Intermediate	3	1506	April 18, 2024
Using Owl ViT Embeddings with cosine similarity 🤗Transformers	1	560	February 15, 2024

ViT produces different embeddings each time?

Related topics