Using trasnsformer to get image features

prithivida · May 9, 2022, 6:17pm

Sure it can be done (for the 1st question), To extract features use the bare model, for instance, if we are using ViT the naming convention for the bare model is ViTModel & by default *most models returns last_hidden_state (last layer) and pooler_output. To get all layers set output_hidden_states=True (line 10) in the forward pass. Now you can access all the layers, you can play with them with the index.

Consider this code

1. from transformers import ViTFeatureExtractor, ViTModel
2. import torch
3. from datasets import load_dataset

4. dataset = load_dataset("huggingface/cats-image")
5. image = dataset["test"]["image"][0]

6. feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
7. model = ViTModel.from_pretrained("google/vit-base-patch16-224-in21k")

8. inputs = feature_extractor(image, return_tensors="pt")

9. with torch.no_grad():
10.     outputs = model(**inputs, output_hidden_states=True)

Topic		Replies	Views
What is the index of the class token feature vector? 🤗Transformers	0	82	March 21, 2024
Image Features as Model Input Beginners	2	931	November 18, 2020
Extract visual and contextual features from images Models	5	4405	August 27, 2021
Feature Extraction pipeline for images Beginners	0	705	August 8, 2023
What is the correct way to create a feature extractor for a hugging face (HF) ViT model? Intermediate	1	1060	April 6, 2023

Using trasnsformer to get image features

Related topics