Dear,
from the output of a ViT how I can reconstruct an image of features like the output of a CNN?
I need this kind of representation for a network that uses that representation for the information information.
Dear,
from the output of a ViT how I can reconstruct an image of features like the output of a CNN?
I need this kind of representation for a network that uses that representation for the information information.
Hello,
I need to obtain an output like the features images in the video of that image