Sure it can be done (for the 1st question), To extract features use the bare model, for instance, if we are using ViT the naming convention for the bare model is ViTModel & by default *most models returns last_hidden_state
(last layer) and pooler_output
. To get all layers set output_hidden_states=True
(line 10) in the forward pass. Now you can access all the layers, you can play with them with the index.
Consider this code
1. from transformers import ViTFeatureExtractor, ViTModel
2. import torch
3. from datasets import load_dataset
4. dataset = load_dataset("huggingface/cats-image")
5. image = dataset["test"]["image"][0]
6. feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
7. model = ViTModel.from_pretrained("google/vit-base-patch16-224-in21k")
8. inputs = feature_extractor(image, return_tensors="pt")
9. with torch.no_grad():
10. outputs = model(**inputs, output_hidden_states=True)