Getting Q, K, V matrices of a ViT

Mleendox · July 5, 2023, 5:42pm

Im doing my research on the paper [2207.05273] Cross-Architecture Knowledge Distillation (arxiv.org). so for the teacher transformer network, an input image is passed through its transformer blocks, and after the inference of several transformer blocks, the feature hT ∈ RN×(3hw) is generated. So when using google_vit = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224') for example, how do I get the feature h_t, or rather the query Q, key K and the value V of the Transformer teacher?

Topic		Replies	Views
How does Q, K, V differ in LLM? 🤗Transformers	1	20	May 28, 2025
Using trasnsformer to get image features 🤗Transformers	3	3327	March 20, 2024
What is the best way to fine-tune ViT with a custom dataset? Beginners	2	4099	January 12, 2025
ViT produces different embeddings each time? Models	0	274	July 10, 2023
Using Inception V3 as Backbone for Vision Transformer Beginners	0	41	October 13, 2024

Getting Q, K, V matrices of a ViT

Related topics