DiT outputs clarification

Francesco · August 2, 2023, 1:37pm

Hi there!

Trying to run DiT but I cannot see any doc about what are the outputs and what I should do with them, example

from transformers import BeitImageProcessor, BeitForMaskedImageModeling
import torch
from PIL import Image

image = Image.open('path_to_your_document_image').convert('RGB')

processor = BeitImageProcessor.from_pretrained("microsoft/dit-large")
model = BeitForMaskedImageModeling.from_pretrained("microsoft/dit-large")

num_patches = (model.config.image_size // model.config.patch_size) ** 2
pixel_values = processor(images=image, return_tensors="pt").pixel_values
# create random boolean mask of shape (batch_size, num_patches)
bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool()

outputs = model(pixel_values, bool_masked_pos=bool_masked_pos)
loss, logits = outputs.loss, outputs.logits

could you provide more clarification about logit? Happy to PR a fix for the doc later

Thanks

Topic		Replies	Views
How to properly train BEiT for Masked Image Modeling Intermediate	0	943	March 7, 2022
How to convert ViTForMaskedImageModeling outputs to image Intermediate	1	581	August 23, 2022
New model output types 🤗Transformers	7	5726	March 11, 2021
TF transformers model inputs and outputs showing none? 🤗Transformers	1	1138	April 25, 2022
Local pytorch model does not work Models	0	82	June 28, 2024

DiT outputs clarification

Related topics