How to convert ViTForMaskedImageModeling outputs to image

Antoine · August 23, 2022, 11:48am

Hi,

I would like to implement an image completion task based on MaskedImageModeling compatible models.
I interpreted outputs.logits as reconstructed pixel values, yet I couldn’t find resources on how to revert this logits back to PIL image.

Can anyone help with this or provide relevant ressources?
Thanks

nielsr · August 23, 2022, 12:05pm

Hi,

This notebook is probably helpful for that: Transformers-Tutorials/ViT_MAE_visualization_demo.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub. It’s illustrated for ViTMAE, but I assume the approach is similar for SimMIM models (which is what xxxForMaskedImageModeling models are).

Topic		Replies	Views
DiT outputs clarification Models	0	246	August 2, 2023
Why does ViTForMaskedImageModeling not construct the original image correctly? Beginners	0	178	May 17, 2023
Combining encoder from one model and a decoder for another for image reconstruction Beginners	0	339	December 15, 2022
Inference with VitMAE by providing a mask 🤗Transformers	0	285	January 3, 2024
ViTMAEModel With model.eval(), get two different representations? 🤗Transformers	3	306	August 10, 2022

How to convert ViTForMaskedImageModeling outputs to image

Related topics