Dear,
from the output of a ViT how I can reconstruct an image of features like the output of a CNN?
I need this kind of representation for a network that uses that representation for the information information.
Dear,
from the output of a ViT how I can reconstruct an image of features like the output of a CNN?
I need this kind of representation for a network that uses that representation for the information information.