Hidden_states Transformers for computer vision

kanlions · July 21, 2022, 12:56pm

Hi all,
I am having trouble in interpreting the hidden_state and last_hidden_state indexing with respect to transformer models for computer vision

which layer output is the last_hidden state. Example in a swin transformer tiny the hidden_state returns a tuple of 5 with sizes 3136x96, 784x192, 196x38, 49x768 and 49x768 respectively. I tried to view them but I was not able to get the last_hidden_state from the tuples of hidden_state.
Similar problem I faced in VIT models too
Please can anyone help in understanding these embeddings from Model output class specially for transformers of computer vision as I am trying to find some interpretibility from the model outputs.

Thanks in advanced

Topic		Replies	Views
Hidden states embedding tensors 🤗Transformers	5	4021	July 22, 2023
Swin transformer hidden states( feature map) different 🤗Transformers	1	576	November 3, 2022
Size of last_hidden_state and mask in ViTMAE Beginners	2	341	January 23, 2024
VivitModel last hidden states dimension Problem 🤗Transformers	0	48	July 11, 2024
Is last_hidden_state the output of Encoder block? Beginners	1	446	December 23, 2021

Hidden_states Transformers for computer vision

Related topics