What should be the correct feature shape for image - extracted using Swin Transformer?

polarfatbear · July 3, 2022, 8:01pm

Hi ! I am new to using huggingface transformers module.
I am facing a problem, I hope someone can help me out.

What I am trying to do: I am building a gender classifier - 5k images with 32x32 size. They are all RGB.
I am using SwinForImageClassification. I was able to train and get a 80% - ish accuracy.
Now I am trying to get the image features only. I tried using SwinModel for extraction the feature only (After reading this : Using trasnsformer to get image features)

I am getting Feature shape: [494, 49, 768] on Training set with size: 3952
According to the example found here (Swin Transformer) the shape seems to be ok to me.

The Problem I am facing: My supervisor is saying that for 5k image the Feature shape should be like [5000,1024] for Swin Base Model.

How do I achieve this ? Any suggestions ?

Topic		Replies	Views
Node: 'model/swin_transformer/tf_swin_model/swin/encoder/layers.1/blocks.0/Reshape_33' Input to reshape is a tensor with 3763200 values, but the requested shape requires a multiple of 20384 🤗Transformers	0	100	May 2, 2024
Using trasnsformer to get image features 🤗Transformers	3	3346	March 20, 2024
Input to reshape is a tensor with 3763200 values, but the requested shape requires a multiple of 20384 🤗Transformers	0	87	May 8, 2024
Swin transformer hidden states( feature map) different 🤗Transformers	1	576	November 3, 2022
Image Features as Model Input Beginners	2	928	November 18, 2020

What should be the correct feature shape for image - extracted using Swin Transformer?

Related topics