Swin transformer hidden states( feature map) different

from transformers import AutoFeatureExtractor, SwinModel
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/swin-base-patch4-window7-224")
model = SwinModel.from_pretrained("microsoft/swin-base-patch4-window7-224", output_hidden_states=True)

inputs = feature_extractor(image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state
hidden_states = outputs.hidden_states


Hi, I am using a Swin transformer to extract features or hidden states in NLP,
I think the last_hidden_state should be same as the hidden_state[-1], while it outputs False. Is there anything I missed?


In this case you haven’t specified output_hidden_states=True in the forward of the model. Hence output.hidden_states will be None.