Using trasnsformer to get image features

marcomameli01 · May 9, 2022, 5:34pm

Hello to everybody,
I would like to use one transformers tro extract features from images at different level, as for example it is possible to do with convolutional network that we can take the ouput from convolutional layer at level 3 or 4 or 5.
Can be a possibility to do something like that with tranformers?

Or, and that is more interesting, how I can get the features with the positional embending from the tranformers and use them as input in other kind of network with attention.

prithivida · May 9, 2022, 6:17pm

Sure it can be done (for the 1st question), To extract features use the bare model, for instance, if we are using ViT the naming convention for the bare model is ViTModel & by default *most models returns last_hidden_state (last layer) and pooler_output. To get all layers set output_hidden_states=True (line 10) in the forward pass. Now you can access all the layers, you can play with them with the index.

Consider this code

1. from transformers import ViTFeatureExtractor, ViTModel
2. import torch
3. from datasets import load_dataset

4. dataset = load_dataset("huggingface/cats-image")
5. image = dataset["test"]["image"][0]

6. feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
7. model = ViTModel.from_pretrained("google/vit-base-patch16-224-in21k")

8. inputs = feature_extractor(image, return_tensors="pt")

9. with torch.no_grad():
10.     outputs = model(**inputs, output_hidden_states=True)

marcomameli01 · May 15, 2022, 10:16am

I try this solution but when I pass an image as input which is a 3-channels image in batch of dimension 1 but I receive the error: keyError: ((1, 1, 224, 224), ‘|u1’) and I don’t find where the image become 1 channel

mae338 · March 20, 2024, 3:04pm

Thank you for this valuable information. I just have a question. I tried it and I got a tuple comprising 13 tensors. Each tensor shape is 1x12x197x768. My question is: How can I find the class token feature vector?

Topic		Replies	Views
What is the index of the class token feature vector? 🤗Transformers	0	82	March 21, 2024
Image Features as Model Input Beginners	2	929	November 18, 2020
Extract visual and contextual features from images Models	5	4387	August 27, 2021
Feature Extraction pipeline for images Beginners	0	703	August 8, 2023
What is the correct way to create a feature extractor for a hugging face (HF) ViT model? Intermediate	1	1055	April 6, 2023

Using trasnsformer to get image features

Related topics