How to convert ViTForMaskedImageModeling outputs to image

Hi,

I would like to implement an image completion task based on MaskedImageModeling compatible models.
I interpreted outputs.logits as reconstructed pixel values, yet I couldn’t find resources on how to revert this logits back to PIL image.

Can anyone help with this or provide relevant ressources?
Thanks

Hi,

This notebook is probably helpful for that: Transformers-Tutorials/ViT_MAE_visualization_demo.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub. It’s illustrated for ViTMAE, but I assume the approach is similar for SimMIM models (which is what xxxForMaskedImageModeling models are).