DiT outputs clarification

Hi there!

Trying to run DiT but I cannot see any doc about what are the outputs and what I should do with them, example

from transformers import BeitImageProcessor, BeitForMaskedImageModeling
import torch
from PIL import Image

image = Image.open('path_to_your_document_image').convert('RGB')

processor = BeitImageProcessor.from_pretrained("microsoft/dit-large")
model = BeitForMaskedImageModeling.from_pretrained("microsoft/dit-large")

num_patches = (model.config.image_size // model.config.patch_size) ** 2
pixel_values = processor(images=image, return_tensors="pt").pixel_values
# create random boolean mask of shape (batch_size, num_patches)
bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool()

outputs = model(pixel_values, bool_masked_pos=bool_masked_pos)
loss, logits = outputs.loss, outputs.logits

could you provide more clarification about logit? Happy to PR a fix for the doc later

Thanks